
Staff Site Reliability Engineer
EarnIn
Posted about 3 hours ago
About EarnIn
As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck. Our community members access their earnings as they earn them, with options to spend, save, and grow their money without mandatory fees, interest rates, or credit checks.
We’re fortunate to have an incredibly experienced leadership team, combined with world-class funding partners like A16Z, Matrix Partners, DST, Ribbit Capital, and a very healthy core business with a tremendous runway. We’re growing fast and are excited to continue bringing world-class talent onboard to help shape the next chapter of our growth journey.
POSITION SUMMARY
Lead EarnIn's shift to AI-first reliability engineering. Define how AI transforms on-call, incident response, alert triage, postmortems, and production investigations across SRE and product engineering teams, while setting SLO-driven standards and resilience patterns that enable the company to ship fast and stay safe. The base salary range for this full-time position is $252,000-$308,000, plus equity and benefits. Our salary ranges are determined by role, level, and location. This is a hybrid position in Mountain View (Headquarters) and will require in-office work 2 days a week.
WHAT YOU'LL DO
- Set a reliability strategy with AI at the center. Define SLIs, SLOs, and error budgets across critical services. Use AI to surface trends, predict capacity risks, and auto-generate reliability scorecards so teams act on data.
- Redesign the incident lifecycle around AI-assisted speed. Lead high-severity incident response as IC. Build AI-driven alert correlation and triage that reduces noise and accelerates root-cause identification. Drive adoption of AI-generated postmortems that surface systemic patterns and automatically track corrective actions through to completion.
- Improve on-call fundamentally better through automation. Build AI agents that draft runbook responses, pull relevant context from Datadog, incident.io, and Slack during pages, and recommend remediation steps, so on-call engineers spend less time deciding and searching.
- Push AI-first operations into product engineering teams. Partner with product engineering to embed AI-assisted investigation, alerting, and production readiness into their workflows. Make AI tooling the default path for every team that owns a service, not an SRE-only capability.
- Architect for resilience at scale. Guide service designs for graceful degradation, failure isolation, and capacity planning across EarnIn's AWS footprint (EKS, Kafka, DynamoDB, RDS, SQS). Use AI-driven analysis to identify architectural weak points before they become incidents.
- Raise the bar through mentorship and standards. Coach engineers on reliability practices, run design and incident reviews, and build documentation and tooling that makes reliability knowledge accessible. Set the expectation that AI-assisted workflows are how EarnIn operates, not an experiment.
WHAT WE'RE LOOKING FOR
- 7+ years in SRE, Software Engineering, or Infrastructure Engineering with increasing scope and cross-org influence. Track record of KPI driven reliability and operational excellence improvements at scale.
- Demonstrated experience applying AI/LLMs to operational workflows in production: alert triage/resolution, runbook automation, incident investigation, postmortem, or agentic operations tooling. Not a theoretical interest, but shipped work.
- Significant expertise with SLOs/SLIs, error budgets, incident command, and blameless postmortems in large-scale distributed systems. You have driven follow-through that actually prevented recurrence.
- Meaningful software engineering ability (Python, Go, or similar). You build tools and automation, not just dashboards.
- Deep observability experience (Datadog, CloudWatch, OpenTelemetry) with pragmatic, signal-heavy alerting designed for real human response, enhanced by AI-driven noise reduction.
- Solid infrastructure-as-code proficiency (Terraform, Kubernetes, AWS) with safe, reversible deployment practices.
- Proficiency with AI-assisted development tools (Cursor, Claude Code, Copilot) to accelerate your own engineering work and to model that behavior for the teams you partner with, and experience using AI-assisted development tools as part of your software development workflow
- Experience in fintech or regulated environments (SOC 2, PCI), and familiarity with FinOps or cost/performance tradeoffs in high-scale systems is a plus
#LI-Hybrid
At EarnIn, we believe that the best way to build a financial system that works for everyday people is by hiring a team that represents our diverse community. Our team is diverse not only in background and experience but also in perspective. We celebrate our diversity and strive to create a culture of belonging.
Job details
Jobr Assistant extension
Get the extension →