Senior Site Reliability Engineer - Observability
Moderna.com
Office
Mazowieckie
Full Time
The Role:
Joining Moderna offers the unique opportunity to be part of a pioneering team that's revolutionizing medicine through mRNA technology, with a diverse pipeline of development programs across various diseases.
As an employee, you'll be part of a continually growing organization, working alongside exceptional colleagues and strategic partners worldwide, contributing to global health initiatives.
Moderna's commitment to advancing the technological frontier of mRNA medicines ensures a challenging and rewarding career experience, with the potential to make a significant impact on patients' lives worldwide.
Moderna is solidifying its presence within our international business services hub in Warsaw, Poland, a city renowned for its rich scientific and technological heritage. This hub provides critical functions, meeting the growing demand of Moderna’s global business operations. We're inviting professionals from around the world to join our mission and contribute to the future of mRNA medicines.
We’re seeking a Senior Site Reliability Engineer – Observability with deep expertise in designing, building, and operating observability solutions across application, database, host, and container environments. In this role, you will lead the development of a modern, open-source observability platform – leveraging technologies such as Grafana, Prometheus, or similar – that is scalable, resilient, and cost-effective. This platform will form the foundation for enterprise-wide monitoring and log management, empowering teams to gain actionable insights, optimize performance, and improve system reliability.
This is a high-impact role for a self-starter who takes initiative and drives outcomes, with ownership spanning observability platforms, governance, agent fleet management, automation, and FinOps practices – shaping how Moderna advances its observability strategy in a rapidly growing global enterprise.
- Here’s What You’ll Do
- Your Key Responsibilities Will Be
- Platform Ownership & Operations
- Manage and advance Moderna’s enterprise observability platform with a focus on open-source and SaaS observability technologies (Grafana, Prometheus, Loki, Tempo, Jaeger, OpenTelemetry, Dynatrace, Splunk, etc.).
- Lead governance, agent fleet management, and FinOps optimization to ensure the platform is scalable, cost-effective, and compliant with enterprise requirements.
- Balance hands-on engineering work (building, configuring, and operating the platform) with strategic ownership (roadmap influence, governance, cost optimization).
- Collaborate with vendors and open-source communities to influence feature roadmaps and maximize platform value.
Observability Engineering
- Design and build highly scalable, resilient, and cost-optimized observability architectures to support application, database, host, and container monitoring.
- Implement telemetry pipelines for metrics, traces, and logs using Grafana, Prometheus exporters (e.g., Node, Blackbox), Kubernetes instrumentation, distributed tracing, or similar technologies.
- Establish and evolve best practices for monitoring, alerting, SLOs/SLIs, and incident detection across hybrid environments (cloud-native and on-prem).
- Partner with application and infrastructure teams to enable self-service observability capabilities, accelerating troubleshooting and reliability improvements.
Log Management
- Build and maintain enterprise-scale log management capabilities within the observability platform.
- Evolve log management to serve as a scalable, cost-effective alternative to traditional log aggregation solutions.
- Partner with security and infrastructure teams to ensure logging meets performance, compliance, and retention requirements.
Incident Response & Collaboration
- Integrate observability solutions with incident management platforms such as PagerDuty to streamline escalation, response, and workflow automation.
- Oversee and optimize on-call processes, ensuring alerts are actionable, routed effectively, and resolved quickly.
- Provide real-time telemetry during incidents and support root cause analysis (RCA) backed by observability data.
- Develop automation using Python, Terraform, Ansible, and CI/CD pipelines to streamline observability workflows.
- Implement self-healing mechanisms and automated remediation for recurring reliability issues.
- Oversee and optimize on-call processes, ensuring alerts are actionable, routed effectively, and resolved quickly.
- Provide real-time telemetry during incidents and support root cause analysis (RCA) backed by observability data.
- Develop automation using Python, Terraform, Ansible, and CI/CD pipelines to streamline observability workflows.
- Implement self-healing mechanisms and automated remediation for recurring reliability issues.
Automation & Integration
- Ensure integrations with enterprise platforms, including PagerDuty, ServiceNow, and Jira, to enhance incident, change, and problem management.
Analytics & Reporting
- Deliver dashboards and reporting that give both engineers and leadership actionable visibility into system health, reliability, and costs.
- Track and report key metrics such as MTTA, MTTR, error, and cost per workload.
Knowledge Sharing & Continuous Improvement
- Create documentation, runbooks, and training to support adoption and consistency across engineering teams.
- Participate in post-incident reviews, applying lessons learned to refine monitoring strategies and prevent recurrence.
- Promote a culture of continuous learning, improvement, and observability adoption across the enterprise.
- The key Moderna Mindsets you’ll need to succeed in the role:
- Create documentation, runbooks, and training to support adoption and consistency across engineering teams.
- Participate in post-incident reviews, applying lessons learned to refine monitoring strategies and prevent recurrence.
- Promote a culture of continuous learning, improvement, and observability adoption across the enterprise.
- The key Moderna Mindsets you’ll need to succeed in the role:
- We behave like owners. You’ll be a self-starter who takes initiative and drives outcomes, going beyond assigned tasks to deliver platforms that create long-term value.
- We act with urgency. Action today compounds the lives saved tomorrow. You will proactively optimize observability tools and workflows to enhance system performance and reliability.
- We obsess over learning. We don’t have to be the smartest – we have to learn the fastest. In this role, you will continuously refine monitoring strategies based on real-time data and incident response learnings.
Here’s What You’ll Need (Basic Qualifications)
- 7+ years of experience in site reliability engineering, observability, or platform engineering.
- Extensive expertise in managing and administering SaaS (Dynatrace, Splunk, or similar) or open-source observability platforms, including governance, agent fleet management, and cost optimization.
- Proven experience designing and building scalable, resilient, and cost-effective observability platforms using Prometheus, Grafana, Node/Blackbox Exporters, Kubernetes, or similar.
- Strong knowledge of observability practices (metrics, logs, traces, SLO/SLI design) across complex, large-scale enterprise environments.
- Hands-on experience with incident management platforms such as PagerDuty and ITSM integrations (ServiceNow, Jira).
- Proficiency in automation and infrastructure-as-code (Python, Terraform, Ansible, Bash).
- Experience monitoring and troubleshooting hybrid and cloud-native environments (AWS, Azure, or GCP).
- Strong problem-solving skills and the ability to operate in a high-paced, global environment.
- Demonstrated ability to take initiative, work independently, and drive outcomes in complex enterprise environments.
Here’s What You’ll Bring to the Table (Preferred Qualifications):
- Experience working in biotech, pharmaceutical, healthcare, or other regulated environments (e.g., GxP, HIPAA)
- Experience with enterprise-scale log management (e.g., Loki, Elastic, Splunk) and retention/cost optimization.
- Familiarity with ITSM processes and integrations with observability solutions.
- Relevant certifications in AWS, Azure, Dynatrace, Splunk or related observability technologies.
- A proactive, innovative mindset with a passion for open-source adoption, continuous improvement, and automation.
- Experience working in biotech, pharmaceutical, healthcare, or other regulated environments (e.g., GxP, HIPAA)
- Experience with enterprise-scale log management (e.g., Loki, Elastic, Splunk) and retention/cost optimization.
- Familiarity with ITSM processes and integrations with observability solutions.
- Relevant certifications in AWS, Azure, Dynatrace, Splunk or related observability technologies.
- A proactive, innovative mindset with a passion for open-source adoption, continuous improvement, and automation.
At Moderna, we believe that when you feel your best, you can do your best work. That’s why our global benefits and well-being resources are designed to support you—at work, at home, and everywhere in between.
- Quality healthcare and insurance benefits
- Lifestyle Spending Accounts to create your own pathway to well-being
- Free premium access to fitness, nutrition, and mindfulness classes
- Family planning and adoption benefits
- Generous paid time off, including vacation, bank holidays, volunteer days, sabbatical, global recharge days, and a discretionary year-end shutdown
- Savings and investments
- Location-specific perks and extras!
The benefits offered may vary depending on the nature of your employment with Moderna and the country where you work.
About Moderna
Since our founding in 2010, we have aspired to build the leading mRNA technology platform, the infrastructure to reimagine how medicines are created and delivered, and a world-class team. We believe in giving our people a platform to change medicine and an opportunity to change the world.
By living our mission, values, and mindsets every day, our people are the driving force behind our scientific progress and our culture. Together, we are creating a culture of belonging and building an organization that cares deeply for our patients, our employees, the environment, and our communities.
We are proud to have been recognized as a Science Magazine Top Biopharma Employer, a Fast Company Best Workplace for Innovators, and a Great Place to Work in the U.S.
As we build our company, we have always believed an in-person culture is critical to our success. Moderna champions the significant benefits of in-office collaboration by embracing a 70/30 work model. This 70% in-office structure helps to foster a culture rich in innovation, teamwork, and direct mentorship. Join us in shaping a world where every interaction is an opportunity to learn, contribute, and make a meaningful impact.
If you want to make a difference and join a team that is changing the future of medicine, we invite you to visit modernatx.com/careers to learn more about our current opportunities.
Moderna is a smoke-free, alcohol-free, and drug-free work environment.
Moderna is a place where everyone can grow. If you meet the Basic Qualifications for the role and you would be excited to contribute to our mission every day, please apply!
Moderna is committed to equal opportunity in employment and non-discrimination for all employees and qualified applicants without regard to a person's race, color, sex, gender identity or expression, age, religion, national origin, ancestry or citizenship, ethnicity, disability, military or protected veteran status, genetic information, sexual orientation, marital or familial status, or any other personal characteristic protected under applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.
We’re focused on attracting, retaining, developing, and advancing our employees. By cultivating a workplace that values diverse experiences, backgrounds, and ideas, we create an environment where every employee can contribute their best.
Moderna is committed to offering reasonable accommodation or adjustments to qualified job applicants with disabilities. Any applicant requiring an accommodation or adjustment in connection with the hiring process and/or to perform the essential functions of the position for which the applicant has applied should contact the Accommodations and Adjustments team at leavesandaccommodations@modernatx.com.
-
Senior Site Reliability Engineer - Observability
Office
Mazowieckie
Full Time
October 6, 2025