company logo

Site Reliability Engineer II

Atlan.com

Hybrid

India

Full Time

Data is at the core of modern business, yet many teams struggle with its overwhelming volume and complexity. At Atlan, we’re changing that. As the world’s first active metadata platform, we help organisations transform data chaos into clarity and seamless collaboration.

From Fortune 500 leaders to hyper-growth startups, from automotive innovators redefining mobility to healthcare organisations saving lives, and from Wall Street powerhouses to Silicon Valley trailblazers — we empower ambitious teams across industries to unlock the full potential of their data.

Recognised as leaders by Gartner and Forrester and backed by Insight Partners, Atlan is at the forefront of reimagining how humans and data work together. Joining us means becoming part of a movement to shape a future where data drives extraordinary outcomes.

Why This Role Matters 🔗

As a key member of Atlan’s Platform & Reliability Engineering Team, your core responsibility will be to strengthen our alert management and incident response capabilities, ensuring every customer experience remains fast, reliable, and uninterrupted.

Whether you’re handling production incidents, automating operational workflows, or enhancing observability and monitoring, your work will directly contribute to Atlan’s mission of empowering modern data teams with a resilient and seamless platform.

At Atlan, we’re building high-performance, reliability-driven engineering teams across every function — and this role is foundational. We’re looking for curious, self-driven engineers who thrive under pressure, love solving real-world reliability challenges, and are passionate about keeping systems stable as we scale globally.

We value engineers who use data, automation, and deep systems thinking to make reliability a core part of how we build and operate not just a function, but a culture.

Your Mission At Atlan 🌟

  • Own and operate end-to-end reliability for critical systems — from alert triage and incident resolution to long-term preventive improvements.
  • Proactively manage incidents within defined SLAs (60 mins for Critical, 180 mins for High) and ensure smooth collaboration across teams during resolution.
  • Enhance observability by improving monitoring systems, refining alerts, and reducing noise to focus on what truly matters.
  • Automate operations and incident workflows to eliminate manual toil, improving speed, consistency, and reliability.
  • Collaborate across teams — work with Platform, Observability, and Product Engineering teams to strengthen uptime and service stability.
  • Contribute to documentation and playbooks, ensuring that every incident drives learning, process improvement, and team efficiency.

What Makes You A Great Fit 😍

  • Proven experience managing alerts, incidents, and root cause analyses in production environments.
  • Hands-on knowledge of cloud platforms (AWS, GCP, or Azure) and Kubernetes — including networking, deployments, and troubleshooting.
  • Familiarity with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK, or Datadog.
  • Ability to automate repetitive operational tasks using scripting (Python, Bash, or Shell).
  • Strong communication and collaboration skills — especially in distributed or remote-first teams.
  • A mindset of ownership, curiosity, and calm under pressure — you thrive in incident response and turn challenges into learning opportunities.

Why You’Ll Love Working Here 💙

  • Real impact from Day 1: Your work directly shapes reliability for thousands of users across the globe.
  • Modern tech stack: Work with cutting-edge tools — Kubernetes, Terraform, Prometheus, Datadog, and more.
  • Learning culture: Collaborate with world-class platform engineers and senior SREs who believe in mentorship and continuous growth.
  • Autonomy & trust: Freedom to experiment, improve, and own your work end-to-end.
  • Clear growth path: Grow from SRE II → Senior SRE → Senior SRE II → Staff SRE → Principal SRE as you expand your technical depth and ownership scope.

Join Us If You Want To...

  • Help build the backbone of Atlan’s global data platform.
  • Turn reactive operations into proactive reliability.
  • Be part of a culture that treats reliability not as a checklist — but as a craft.

Why Atlan for You?

At Atlan, we believe the future belongs to the humans of data. From curing diseases to advancing space exploration, data teams are powering humanity's greatest achievements. Yet, working with data can be chaotic—our mission is to transform that experience. We're reimagining how data teams collaborate by building the home they deserve, enabling them to create winning data cultures and drive meaningful progress.

Joining Atlan Means:

  1. Ownership from Day One: Whether you're an intern or a full-time teammate, you’ll own impactful projects, chart your growth, and collaborate with some of the best minds in the industry.
  2. Limitless Opportunities: At Atlan, your growth has no boundaries. If you’re ready to take initiative, the sky’s the limit.
  3. A Global Data Community: We’re deeply embedded in the modern data stack, contributing to open-source projects, sponsoring meet-ups, and empowering team members to grow through conferences and learning opportunities.

As a fast-growing, fully remote company trusted by global leaders like Cisco, Nasdaq, and HubSpot, we’re creating a category-defining platform for data and AI governance. Backed by top investors, we’ve achieved 7X revenue growth in two years and are building a talented team spanning 15+ countries.

If you’re ready to do your life’s best work and help shape the future of data collaboration, join Atlan and become part of a mission to empower the humans of data to achieve more, together.

We are an equal opportunity employer
At Atlan, we’re committed to helping data teams do their lives’ best work. We believe that diversity and authenticity are the cornerstones of innovation, and by embracing varied perspectives and experiences, we can create a workplace where everyone thrives. Atlan is proud to be an equal opportunity employer and does not discriminate based on race, color, religion, national origin, age, disability, sex, gender identity or expression, sexual orientation, marital status, military or veteran status, or any other characteristic protected by law.

Site Reliability Engineer II

Hybrid

India

Full Time

October 9, 2025

company logo

Atlan

Atlan.com

AtlanHQ