Senior Site Reliability Engineer (f/m/d)
Hive.com
Office
Berlin
Full Time
The Position
If you are excited about growing with Hive and love building reliable, scalable systems, this might be the right position for you!
You'll join as a Site Reliability Engineer, helping power our infrastructure, ensure platform reliability, and drive operational excellence across our systems.
What You'Ll Be Responsible For:
Some exemplary topics of what we expect a Site Reliability Engineer to cover is below — an important note is that we're not looking for someone that purely optimizes the technical aspects of what they're doing, but that they're doing it with their users (engineers, product teams, operations, …) and our customers in mind.
- Design, develop, and maintain scalable infrastructure and automation to support high availability, reliability, and performance across our platform using Terraform, Python/Ruby, and AWS.
- Optimize and manage our cloud infrastructure to ensure high performance, reliability, and cost efficiency.
- Architect, create, and maintain observability solutions that reflect the needs of the business and enable proactive incident prevention.
- Develop reliability engineering practices to be integrated across our entire operations platform, such as:
- Service-level objectives (SLOs) and error budgets for critical business services
- Automated monitoring and alerting systems for proactive incident detection and resolution
- Infrastructure automation and self-service tools to empower engineering teams
- Capacity planning and performance optimization to support our growing fulfillment network
- Service-level objectives (SLOs) and error budgets for critical business services
- Automated monitoring and alerting systems for proactive incident detection and resolution
- Infrastructure automation and self-service tools to empower engineering teams
- Capacity planning and performance optimization to support our growing fulfillment network
Hive is building the leading operations platform for independent commerce. It's time for us to take the next step and deeply invest in our infrastructure and reliability foundations as a key enabler to enhance the value of our multi-product offering. Better commerce operations for merchants, consumers, and our fulfillment network.
What You'Ll Be Doing:
- Infrastructure optimization: Design, build, and optimize cloud infrastructure to reduce latency, improve performance, and ensure high availability for mission-critical services.
- Scalable observability setup: Help shape and implement comprehensive observability solutions (monitoring, logging, tracing) to provide full visibility into system health and performance.
- Automation and tooling: Build self-service tools and automation pipelines that enable engineering teams to deploy and operate services reliably and efficiently.
- Reliability and incident management: Establish SLOs, conduct post-incident reviews, and implement preventive measures to continuously improve system reliability.
- Integration with product infrastructure: Work with engineering teams to embed reliability best practices into our product development lifecycle, enabling seamless operations and enhancing platform capabilities.
Collaboration across the business: Assess stakeholder needs across the Hive organization to drive decision-making and technical approaches proactively, ensuring different use cases are covered properly and thoroughly.
Your Profile
We know – sometimes, you can't tick every box. We would still love to hear from you if you think you're a good fit!
- You have the skills: Strong proficiency in cloud infrastructure (AWS) and significant experience with distributed systems, container orchestration (Kubernetes/ECS), and infrastructure as code (Terraform).
- You get into the details: Hands-on experience with observability tools (Prometheus, Grafana, DataDog, or similar), CI/CD pipelines, and automation frameworks.
- You write code: Proficiency in a programming language such as Python, Go, or similar, and are experienced in building automation tools and infrastructure solutions.
- You know the theory, and the practice: Solid understanding of SRE principles, distributed systems concepts, and reliability engineering best practices.
- You care about the craft: Excellent problem-solving skills and high attention to detail.
- You bring people along: Strong communication skills and the ability to work collaboratively in a team environment.
Our Offering
- Be part of the Hive: You will work with a highly driven team of exceptional and experienced people in all domains. People at Hive have worked at organizations such as McKinsey, Amazon, Shopify, Google, Flink, Blackstone, J.P. Morgan & DHL before. We believe in a culture of trust, collaboration, empowerment and constructive feedback in a positive and inspiring work atmosphere.
- Make an impact: Join a young company with an entrepreneurial culture operating at lightning speed — we want you to grow with us.
- You will be valued: We offer attractive compensation, including virtual employee stock options for all full-time team members plus your choice of hardware according to your preference.
- We support your well-being: Benefit from 30 vacation days annually, with the opportunity for a sabbatical after three years with us, alongside a dedicated monthly wellness and productivity budget.
- We will get you set up: Operating system and hardware of your choice, additional tech equipment that you need, screens, you name it — we want to enable you to do your best work.
- There's more! Enjoy flexible working hours, free drinks and snacks in our office in Berlin, Paris, Milan & Madrid and join regular team events such as off-sites and workcations.
About Us
We're revolutionizing e-commerce operations.
At Hive, we empower brands to excel in the digital commerce era through our innovative operations platform. By combining cutting-edge technology with a curated network of top-tier operations partners, we deliver measurable results.
Our comprehensive platform streamlines the entire operational chain through a single, intuitive interface. Since our founding in 2020, we've rapidly grown to become one of Europe's leading operations platform, partnering with hundreds of innovative brands. With strategic locations in Berlin, Paris, Milan, Madrid, London, and Amsterdam. Backed by prestigious investors including Tiger Global, Earlybird, and Picus Capital, we're scaling our impact across Europe.
Diversity and inclusion are core to our success. We actively cultivate an environment where every team member, regardless of background, can thrive. We welcome talent from all walks of life, regardless of religion, ethnicity, nationality, gender, sexual orientation, age, marital status, or disability. At Hive, authenticity and professional growth go hand in hand.
Senior Site Reliability Engineer (f/m/d)
Office
Berlin
Full Time
October 6, 2025