Engineering Manager: SRE

AuthZed.com

Hybrid

United States

Full Time

About AuthZed:

We are the creators and maintainers of SpiceDB and the authorization infrastructure that companies around the world depend on to keep their engineering teams focused on what matters most - their own product.

We are a Series A company, fixing broken access control with products that eliminate complex permission management while delivering enterprise-scale performance and consistent access control.

AuthZed is a fully remote company with employees across the US, Canada, and Europe. We’re a hardworking and close-knit group with a software-driven culture (yep, even our GTM team understands and loves this technology)! We bring integrity to all our interactions, fostering confidence in decision making - trusting and respecting each voice on our team, every day.

Company Values:

Agency: Everyone should have the capability, freedom, and confidence to bring about changes to our business and product. Organizational processes exist to clearly define our goals, but not restrict how progress is made.
Collaboration: Success is defined in various dimensions and no single person can be an expert in all of them. Without valuing the opinions of others, finding compromises, and sharing mutual trust and respect, you cannot arrive at the best possible solution.
Open-mindedness: Without asking questions, testing assumptions, and questioning our pre-existing biases we risk operating within an echo-chamber. We celebrate the representation of diverse perspectives and backgrounds as a catalyst for creating an inclusive work environment that everyone can appreciate.

About the Role:

At AuthZed, we’re revolutionizing how modern applications handle access control, and reliability is at the heart of what we do. As an SRE Manager, you’ll lead the team responsible for ensuring the reliability, scalability, and performance of AuthZed’s infrastructure as we grow our global customer base.

This is a hands-on leadership role: you’ll manage and grow a team of SREs while remaining deeply engaged with production systems, incident response, and platform architecture. You’ll use a blend of Site Reliability Engineering and Platform Engineering to reduce operational toil, improve safety, and enable product teams to ship reliably at scale.

You’ll work with cutting-edge technologies, design resilient systems, and build automation and paved paths so customers can rely on AuthZed for their most critical workloads.

What you’ll own:

Lead a global team of Site Reliability Engineers delivering infrastructure automation, observability, and operational scalability across multi-cloud and multi-region kubernetes based architectures.
Recruit, hire, onboard and develop engineers while elevating the overall strength of the team.
Act as a player coach by contributing to critical projects while mentoring engineers and supporting their professional growth.
Participate in on-call rotations at a sustainable level to stay grounded in real operational issues.
Guide project planning by defining milestones, identifying dependencies, and working toward timely and meaningful delivery.

Identify toil and lead initiatives to eliminate it through engineering solutions.

Drive automation and platform engineering: safer deploys, progressive delivery, guardrails, and paved paths that reduce toil.

Collaborate with product and engineering to ship features like self-service workflows and infra-as-code expectations with reliability baked in.
Serve as a senior escalation point for complex incident triage and root cause analysis.

What you bring:

10+ years of experience in infrastructure, SRE, or platform engineering roles.
5+ years of team management or technical leadership in SRE or Platform Engineering.
Experience managing distributed teams across US, Canada, EU, and global time zones.
Experience leading or mentoring SRE/Infrastructure/Platform teams in a production SaaS environment. Strong leadership skills with the ability to mentor and coach senior-level engineers.
Strong grasp of SRE fundamentals: SLOs/SLIs, error budgets, incident management, capacity planning, and operational excellence.
Extensive experience with AWS, GCP and Azure managed services.
Strong programming skills and experience writing production-quality automation or tooling (e.g., Go, Python, Bash).
Hands-on experience with Kubernetes, Kubernetes Operators/Controllers, containerized workloads, and Infrastructure as Code (Terraform, Pulumi).
Experience with monitoring and observability systems (e.g., Prometheus, Grafana, logging/tracing pipelines).
Excellent communication: can translate reliability tradeoffs to product/exec stakeholders and write crisp incident/postmortem artifacts.
Proven ability to translate operational pain points into engineering deliverables.

Extra shine:

Experience working with or integrating AI-powered systems or tooling.
Experience operating multi-tenant or high-isolation customer environments.
Familiarity with distributed databases and performance tuning at scale.
Experience building internal developer platforms or paved paths.

Life at AuthZed:

Opportunity to work with cutting-edge technology in a rapidly growing sector.
A supported environment where your ideas lead to real impact.
Competitive salary based on experience.
Stock options at an early-stage startup.
Comprehensive benefits including healthcare (US-based) and other insurance.
A full remote and flexible schedule to accommodate different timezones
Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun!