ML-focused Site Reliability Engineer - Developer Platforms
Adobe.com
Office
Bucharest, Romania
Full Time
Our Company
Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.
We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!
The Opportunity
We have a fantastic opportunity for a ML-focused Site Reliability Engineer to join our Developer Platforms team based in Bucharest.
We are looking for an engineer with hands-on experience in machine learning, including designing and training models for real-world applications. The ideal candidate will play a crucial role in developing and implementing anomaly detection systems to proactively spot and address operational issues in intricate infrastructures. This role demands a strong understanding of AI Ops methodologies to optimize performance, automate incident response, and enhance system reliability. Candidates should be enthusiastic about using data to drive intelligent automation and improve service resilience at scale.
The Role:
- Build outstanding things that matter. You’ll work on a critical growth initiative, solving problems for engineers and customers.
- Grow. Sharpen your skills, use innovative technology, and collaborate with your peers.
- Collaborate. Work in an environment that values collaboration.
What You'Ll Do:
- Ensure the highest level of uptime and Quality of Service (QoS) to Adobe’s customers through operational excellence
- Architect and build an AI Anomaly-detection system that works on Adobe’s observability data at scale, partnering with other teams to work across boundaries.
- Define service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality
- Identify areas to improve service resiliency through techniques such as chaos engineering, performance/load testing, anomaly detection, etc
- Support and maintain globally distributed multi-cloud (public and/or private) environments
- Automate common, repeatable tasks at a large scale to reduce toil
- Tackle performance and stability issues using a wide variety of tools
- Participate in an on-call rotation as required
- Determine the root cause for all production level incidents and write corresponding high-quality RCA reports
What You'Ll Need To Succeed:
- Hands-on experience with AI anomaly detection and training models
- Expert in MCP integration, with experience in MCP to MCP communication as a nice to have.
- Understanding of how to fine-tune signals from observability systems to allow our AI capabilities to scale for Production data.
- Deep understanding of both software engineering and technical operations
Devops Skills (Scrum/Kanban/Agile/Ci-Cd/12-Factor)
- Experience in modern cloud-based, SaaS delivery technologies: AWS, Azure, Jenkins, Git, Atlassian Jira and Confluence, Linux, DNS, E-mail, containers, log analysis, monitoring, Java, Apache, Tomcat, Memcached, Qpid, and MySQL on Linux, Prometheus, Grafana, New Relic, Splunk.
- Expertise with containerization orchestration engines (Kubernetes)
- Programming skills, particularly with Python, Java, and Ruby
- Applied skills in machine learning
- Excellent communication, interpersonal, and teamwork skills
- Familiar with a variety of cloud and automation concepts, practices, and procedures
Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law. Learn more.
Adobe aims to make Adobe.com accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call (408) 536-3015.