Empower every employee.

Our mission is to be the world's most used AI employee experience platform by changing the way frontline employees work.

Flip is the leading AI-powered employee experience platform for frontline workers. We're transforming how the people who keep the world running — in retail, manufacturing, and logistics — do their jobs. One app. One touch. Everything they need.

Our mission: Connect every employee to everything they need in one touch.

Job Teaser

As a Senior Site Reliability Engineer in our Platform Squad, you'll own critical reliability domains end-to-end and drive the technical direction within the squad - leading architectural decisions on our platform, mentoring teammates, and continuously raising the reliability bar inside the team.

This role is for an engineer with a proven track record of building and operating high-throughput, highly available systems, who wants senior-level technical ownership and real impact through deep engineering work inside a tight, well-scoped team.

What awaits you with us

Co-own the architecture: Help drive the architecture and evolution of our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe.
Drive the resilience strategy: Define how we approach global scaling, zero-downtime deployments, rollback mechanisms and disaster recovery, and make sure the platform stays available around the clock.
Evolve our observability stack: Improve our LGTM stack (Loki, Grafana, Tempo, Mimir) into a foundation our engineers can trust.
Improve our IaC Platform: Eliminate toil at the source, and make our infrastructure truly self-service for engineering teams.
Lead in incidents: Take a leading role in platform-related major incidents, drive blameless post-mortems for the squad, and translate findings into systemic improvements.
Mentor within the squad: Coach teammates, run RFCs and design reviews inside the team, and help engineers grow into stronger SREs.
Shape our roadmap: Partner with your squad to define the platform's direction.

What you bring to the table

We're looking for a hands-on, SaaS-minded senior Site Reliability Engineer who treats scalability and reliability as a first-class product concern.

Must-Have Qualifications

5+ years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
Proven track record building and operating high-throughput, highly available systems in production.
Deep, production-level experience with Kubernetes on any Hyperscaler.
Strong experience with modern observability stacks (e.g. Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, ELK) and a clear point of view on SLIs, SLOs and error budgets.
Solid software development skills in Go (strongly preferred, since our IaC runs on Pulumi in Go) or Python.
Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g. ArgoCD) + CI/CD pipeline design.
Demonstrated ability to lead complex infrastructure initiatives from design to production - including writing RFCs and driving architecture decisions within your team.
Experience mentoring engineers and raising the technical bar within a team.
Comfortable owning major incidents end-to-end and turning learnings into systemic change.
Strong communication skills and business-fluent English.
Willingness to participate in on-call rotations to ensure the reliability of our platform.

Nice-to-Have Qualifications

Rolled out production-ready API-Gateways with Gateway API (e.g. Envoy Gateway).
Operated multi-cluster service meshes (e.g. Cilium, Linkerd, Istio)
Deployed and maintained Kubernetes Operators (e.g. Strimzi, CNPG).
Operated highly available PostgreSQL in production.

What we offer you

Work mode: We’re remote-first, giving you flexibility to work from home. At the same time, we deeply value the power of in-person collaboration. Depending on the role, you’ll join occasional team events, workshops, or meetings in our Berlin or Stuttgart offices - always with plenty of notice. The exact balance will be discussed during your interview.
Work-Life-Balance: We don't want you to grow roots to your desk chair. That's why we cover the costs of your E-Gym-Wellpass membership and offer job bike leasing.
Celebrating success: Expect highly motivated and committed people in a relaxed working atmosphere.
Be part of something bigger: You actively shape Flip in your role. Along the way, you are an enabler of the rapid growth process of a young tech company and grow towards your goals, fun is guaranteed.
Happy to be a Flipster: Stay tuned for regular team events and culture days that bring us together as Flipsters.
Working abroad: At Flip you can also work abroad in the European Union.

Senior Site Reliability Engineer (m/f/d)