Senior Site Reliability Engineer
Zillow.com
Office
Mexico City
Full Time
About The Team
The Touring and Connections (TCE) EngOps team drives reliability, scalability, and operational excellence across our engineering platforms. We partner with product, infrastructure, and development teams to ensure systems are performant, observable, secure, and cost-efficient.This team resides within the Touring & Connections organization, which develops consumer-facing features that are central to Zillow’s growth strategy. The development teams we support are responsible for:
Real-time Tours (RTT): delivering seamless touring experiences across mobile and web, eliminating the back-and-forth of scheduling and giving home shoppers the transparent, efficient experience they expect.
Zillow In-App Messaging (ZIM): enabling direct, real-time communication between shoppers and agents, making interactions faster, more convenient, and more engaging.
Agent–buyer connections: helping match buyers with the best possible agent, ensuring they get trusted guidance and expertise as they navigate the market.
EngOps supports Zillow development teams in delivering quality, reliable software quickly and confidently. We combine technical program management (TPM), software quality engineering (SQE), and site reliability engineering (SRE) to drive shift-left practices, improve automation and observability, and reduce operational friction. Our work enables teams to build secure, resilient systems that are dependable and meet customer expectations.
About The Role
As a Senior Site Reliability Engineer joining TCE EngOps, you will design, build, and operate the systems and tooling that ensure the availability and reliability of critical services. You’ll lead initiatives in observability, incident management, infrastructure automation, and performance optimization. In this role, you’ll collaborate closely with development teams, promote SRE best practices, and mentor peers to strengthen reliability culture across the organization.
You will participate in an L3 on-call rotation, driving rapid recovery during incidents and championing systemic improvements afterward. You’ll also explore and apply emerging technologies, including AI-driven practices and tooling, to continuously improve reliability, automation, and developer experience. This position emphasizes both hands-on engineering and coaching/enablement, helping uplift the reliability capabilities of the broader engineering organization.
Responsibilities
- Own the reliability, scalability, and performance of production services.
- Define and implement SLOs/SLAs, error budgets, and capacity planning.
- Design and evolve monitoring, alerting, and observability dashboards with tools such as Prometheus, Grafana, and Datadog.
- Participate in incident response, blameless postmortems, chaos testing, and systemic remediation.
- Drive safe release practices, including canary and blue-green deployments, rollback automation, and CI/CD improvements.
- Enable performance and load testing tooling to enable developers to validate scalability and efficiency.
- Apply cost optimization strategies to improve cloud spend efficiency.
- Build and manage Infrastructure as Code with Terraform.
- Operate and scale containerized services with Docker and Kubernetes.
- Automate workflows and tooling using Python, Go, and Bash.
- Implement cloud best practices in AWS (EC2, VPC, IAM, S3, Route 53).
- Promote shift-left reliability practices through pre-launch reviews, CI quality gates, and risk identification.
- Mentor, coach, and embed with engineering teams to share SRE practices and build reliability maturity.
In addition to a competitive base salary and benefits, this position is also eligible for equity awards based on factors such as experience, performance and location.
Who You Are
- 8+ years of SRE, DevOps, or Platform Engineering experience.
- Proven expertise in designing SLOs, monitoring strategies, and incident response frameworks.
- Strong proficiency with Terraform, GitLab CI/CD, and cloud infrastructure (AWS).
- Hands-on experience with Kubernetes and Docker.
- Skilled in Python, Go, or Bash for automation and tooling.
- Experienced with Prometheus, Grafana, Datadog, or Splunk for observability.
- Deep understanding of networking, security practices, and cloud cost optimization.
- Strong collaborator with experience in developer enablement, coaching, and knowledge sharing.
- Excellent communicator who values blamelessness, automation, and continuous improvement.
- Committed to continuous learning and exploration of emerging technologies, including AI and automation, to drive reliability excellence.
Get To Know Us
Zillow is reimagining real estate to make it easier to unlock life’s next chapter.
As the most-visited real estate website in the United States, Zillow® and its affiliates help movers find and win their home through digital solutions, first class partners, and easier buying, selling, financing and renting experiences. Millions of people visit Zillow Group sites every month to start their home search, and now they can rely on Zillow to help make it easier to move. The work we do is helping people move from dreaming to transacting — and no matter what job you're in, you will play a critical role in making this vision a reality.
Our efforts to streamline the real estate transaction are supported by a deep-rooted culture of innovation, our passion to redefine the employee experience, and a fundamental commitment to Equity and Belonging. We’re also setting the standard for work experiences of the future, where our employees are supported in doing their best work and living a flexible, well-balanced life. But don’t just take our word for it. Read recent reviews on Glassdoor and recent recognition from multiple organizations, including: the 100 Best Companies to Work For in 2022 list, Glassdoor Employees’ Choice Award, honoring the Best Places to Work in 2022, Bloomberg Gender-Equality Index 2022, Human Rights Campaign (HRC) Corporate Equity Index and Best Place to Work for LGBTQ Equality 2022, and TIME 100 Most Influential Companies list.
Zillow Group is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, and gender identity. If you have a disability or special need that requires accommodation, please contact your recruiter directly.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable state and local law.
Senior Site Reliability Engineer
Office
Mexico City
Full Time
October 9, 2025