Senior Observability & Monitoring Engineer (SRE/DevOps)
Cyberark.com
Office
Petah Tikva, Central District, Israel
Full Time
Company Description
About CyberArk:
CyberArk (NASDAQ: CYBR), is the global leader in Identity Security. Centered on privileged access management, CyberArk provides the most comprehensive security offering for any identity – human or machine – across business applications, distributed workforces, hybrid cloud workloads and throughout the DevOps lifecycle. The world’s leading organizations trust CyberArk to help secure their most critical assets. To learn more about CyberArk, visit our CyberArk blogs or follow us on X, LinkedIn or Facebook.
Job Description
We are seeking an experienced Observability & Monitoring Engineer (SRE/DevOps) to lead reliability, observability, and performance efforts for our most critical applications. This role bridges development, operations, and product, ensuring our systems are robust, scalable, and drive superior business outcomes. The Senior Observability & Monitoring Engineer will design and optimize monitoring strategies, automate operational tasks, and serve as a technical mentor for reliability within the R&D organization.
Key Responsibilities:
- Architect, implement, and maintain advanced monitoring, logging, and alerting solutions using Datadog (mandatory), covering infrastructure, application, and business-level metrics.
- Lead and optimize reliability, performance, and scalability efforts for PostgreSQL, Redis, SQS, K8s, and cloud-native environments.
- Design, build, and maintain automations for operational tasks, deployments, and remediations (Infrastructure-as-Code, CI/CD, self-healing workflows).
- Mentor engineers on reliability engineering best practices, monitoring usage, and troubleshooting methodologies.
- Lead knowledge sharing by producing high-quality documentation, technical presentations, and internal training.
- Perform capacity planning, performance tuning, and proactively address potential bottlenecks or scaling issues.
- Stay current with SRE, DevOps, and cloud trends; evaluate and recommend new tools and approaches for continuous improvement.
#Li-Hybrid
#Li-Cr1
Qualifications
- 7+ years of experience in SRE, DevOps, or production engineering roles supporting large-scale distributed systems.
- Expertise architecting and operating monitoring, tracing, and alerting with Datadog (including custom metrics, dashboards, and advanced alerting techniques).
- Experience with additional monitoring/observability platforms (e.g., Prometheus, Grafana, ELK stack).
- Hands-on knowledge of PostgreSQL, Redis, SQS, and Kubernetes (deployment, troubleshooting, scaling, and performance optimization).
- Advanced scripting/programming skills with Python, Bash, or another relevant language.
- Track record of designing and implementing automated solutions (Infrastructure-as-Code, CI/CD pipelines, auto-remediation).
- Strong communication skills, including technical writing, documentation, and presentation to diverse technical audiences.
- Experience working closely with development, product, and architecture teams to embed reliability from the design phase.
- Fluent technical English.
Preferred Qualifications:
- Strong familiarity with SaaS, microservices architectures, and security best practices.
- Cloud certifications (e.g., AWS Certified Solutions Architect, GCP Professional Cloud Engineer) are a plus.
- Deep experience with chaos engineering, performance/load testing, and continuous improvement frameworks.
- Demonstrated ability to mentor engineers, promote reliability culture, and foster knowledge sharing.
Senior Observability & Monitoring Engineer (SRE/DevOps)
Office
Petah Tikva, Central District, Israel
Full Time
September 28, 2025