Site Reliability Engineering Manager

Shippo.com

192k - 261k USD/year

Hybrid

United States

Full Time

Here at Shippo, we are the shipping layer of the internet and we consider ourselves to be one of the core building blocks of e-commerce.
Our mission is to make merchants successful through world class shipping. With our products and solutions, we level the playing field by providing our customers with best-in-class solutions that otherwise wouldn’t be available to them. Through Shippo e-commerce businesses, marketplaces, platforms and a variety of logistics infrastructure providers are able to connect to shipping carriers around the world from one API and dashboard. We provide our customers with the most competitive shipping rates, print labels, automated international documents, shipment tracking, facilitate the returns process and more.
How we’ll deliver success:
As the SRE Manager at Shippo, you will lead a team of engineers responsible for building platforms, tooling, and infrastructure that enable product teams to operate reliable, performant, and scalable services. You will establish frameworks for observability, deployment automation, and infrastructure management that allow product teams to own their service reliability. You will maintain a strong support oriented team while building automation and enabling engineering productivity and operational excellence across the organization.

Responsibilities

Lead and develop a team of platform-focused SRE engineers, providing technical mentorship, career development, and performance management while fostering a culture of automation, self-service, and continuous improvement
Build and maintain internal platforms and tooling that enable product teams to deploy, monitor, and operate their services reliably
Manage observability platforms (metrics, logs, traces, dashboards) that provide product teams visibility into their servicesOwn the infrastructure and Kubernetes platform that all Shippo services run on, ensuring it scales ahead of business needs through capacity planning and performance optimization
Establish frameworks and tooling for SLO/SLI definition, error budget tracking, and reliability measurement that product teams can adopt
Design and maintain CI/CD pipelines, deployment automation, and release tooling that enable safe, frequent deployments
Build infrastructure-as-code foundations and self-service capabilities that allow product teams to provision and manage their infrastructure
Create automation to eliminate toil and prevent infrastructure problems before they impact product teams
Drive infrastructure cost optimization initiatives through analysis, rightsizing recommendations, reserved capacity planning, and waste elimination across the cloud platform
Participate in leadership rotation for Sev1 incidents affecting services or the platform itself
Manage the SRE team’s on-call rotation
Design, implement, and test disaster recovery capabilities and ensure infrastructure security and compliance
Partner with Engineering Managers and TPMs to understand product team needs, prioritize platform investments, and communicate platform roadmap and capabilities
Establish platform SLOs for infrastructure reliability, deployment success rates, build times, and other developer experience metrics

Requirements

3+ years of hands-on engineering management experience
9+ years as a software or systems engineer with deep experience building platforms, tooling, or infrastructure
BS or MS degree in Computer Science or equivalent experience
Expert-level experience designing and operating platforms that enable other engineering teams (internal platform-as-a-product experience)
Strong operational experience with Kubernetes in production environments, including experience building Kubernetes platforms for application teams
Deep expertise with at least one public cloud provider (AWS, GCP) including networking, compute, storage, and managed services
Experience building or maintaining CI/CD systems and deployment automation (GitHub Actions, GitLab CI, ArgoCD, Flux, etc.)
Strong background in infrastructure-as-code tools and patterns (Terraform, Pulumi, CloudFormation, etc.)
Experience designing and implementing observability platforms (Prometheus, Grafana, ELK stack, Datadog, New Relic, etc.)
Proficiency in at least one programming language for tooling and automation (Python, Go, or similar)
Experience establishing reliability frameworks (SLO/SLI/error budgets) that other teams can adoptUnderstanding of developer experience and ability to build self-service tooling that reduces friction
Track record of designing disaster recovery solutions and implementing security and compliance best practices for infrastructure
Exceptional verbal, written, and interpersonal communication skills with ability to influence product teams and engineering leadership
Deep understanding of enabling product team success through platform capabilities

What'S In The Shippo Package?

Healthcare coverage for medical, dental, and vision
Take-as-much-as-you-need vacation policy & flexible working
One week-long company wide winter shutdown
3 Volunteer Days Off (VTOs)
WFH stipend to set up your home office
Charity donation match up to $100
Dedicated programs, coaching, tools, and resources for your professional and career growth as well as an individual learning stipend for your personal and focused growth
Fun team in person time through our Shippos Everywhere program which includes regular team and company off-sites throughout the year as well as local Shippos gatherings

Our Compensation Shippolicy:We believe compensation is a custom experience and are commited to fair and equitable compensation practices. The standard base pay range for this role is min is $192k to a max $261k annual salary. Since we are focused on hiring Shippos Everywhere, we have 2 US pay ranges, a standard compensation range for the majority of the US and a standard +1 compensation range for those who live in areas where the cost of labor is higher, such as NYC and California.The actual base pay is dependent upon many factors, such as: financial budgets, work experience, training, transferable skills, business needs, and market value. The base pay salary ranges are subject to change and may be modified in the future. Total compensation for this role will include, equity, medical, dental, vision and other benefits noted in our Shippos “package” section.
Sail through the process:Here at Shippo, we celebrate inclusivity and are committed to creating equal access to opportunities for people from all backgrounds, perspectives and geographies. These values define who we are and everything we do. All qualified individuals are encouraged to apply. If you need assistance, or a reasonable accommodation during the application and recruiting process, please contact us at accommodations@goshippo.com
Shippos in the wild:Our people, much like the packages we help ship, are all over the world. This means, through our remote-first program, “Shippos Everywhere”, our roles can be based anywhere in the US with the exception of Delaware, Nevada, Ohio, Oregon, Hawaii, New Mexico and West Virginia and many roles can be based internationally.For locations outside of the US and Ireland, the employment contracts are powered by Remote.com (all Shippo perks still apply - including equity!). What we want to emphasize is that you can be successful at Shippo regardless of location.Apply for this job
We leverage AI to review all resumes during the application phase to ensure fairness, comprehensively evaluate each submission, and mitigate bias. However, all decisions at every stage of the process are made by a real person.