
Site Reliability Engineer
BlaBlaCar
Posted about 5 hours ago
About BlaBlaCar
BlaBlaCar is the world’s leading community-based travel app enabling 27 million members a year to carpool or travel by bus in 21 countries. Our team of 800 employees counts over 50 nationalities and is spread across our 5 global offices, 30% working fully remotely.
About BlaBlaCar
BlaBlaCar is the world’s leading community-based travel app enabling 26 million members a year to carpool or travel by bus in 21 countries. Our team of 800 employees counts over 50 nationalities and is spread across our 5 global offices, 30% working fully remotely.
Your Mission
By joining our Foundations department, you will be working alongside talented individuals grouped in small agile teams that each have strong ownership on their piece of these goals. Foundations is composed of seven teams which “provide consistent, easy to use, infrastructures, services, and expertise to support BlaBlaCar’s growth and evolution”.
The Site Reliability Engineering team (SRE) is responsible to provide best in class Observability, Alerting and Incident management tools and processes to service teams. As an enabling team, we help BlaBlacar engineers to efficiently improve their service reliability. Empowering developers and bringing them our reliability expertise are at the core of our daily work.
Technical stack:
-
Core Infrastructure: Kubernetes, Google Cloud Platform
-
GitOps/Delivery: GitHub, Terraform, Flux, Helm, Jenkins
-
Observability/Incident Management: Datadog, Opentelemetry, Grafana IRM,
-
In house Synthetic Tests platform: Playwright, Qualcium, SauceLabs
-
Languages: Go / Python for Tooling, Typescripts/JS for the testing platform
Your responsibilities
-
Support software engineers by creating, maintaining, and improving observability and alerting tools and frameworks. You embrace the use of AI, leveraging agentic to eliminate toil and streamline your daily tasks
-
Own the Service Level Objectives (SLOs) framework, assist in the design and maintenance of indicators (SLI) and objectives to ensure service reliability.
-
Owning the incident management process by defining best practices, standards, and ensuring continuous improvement through post-mortems and chaos engineering. While developers handle incidents within their scope, you could step in as Incident Commander during high-severity incidents, leading coordination efforts .
-
Develop and maintain tools, such as Terraform modules or Go apps, to help automate and enhance reliability across services.
-
Build and promote reporting on operational metrics and incidents to drive distributed and continuous improvement.
Your qualifications
-
1 to 5 years of experience in SRE, DevOps, or Software Engineering roles
-
Working in a multidisciplinary environment will request strong communication skills : you'll need to adapt your communication level to other teams expertise and be able to understand their needs
-
Strong knowledge of observability tools (e.g., Datadog) and understanding of metrics, logging, and tracing.
-
Troubleshooting/oncall experience in production environments, diagnosing and resolving technical issues effectively (experience with Kubernetes is a plus).
-
Full working proficiency in English
-
Fit with our BlaBlaPrinciples
-
Thriving in a collaborative, fast-growing and innovative environment
-
Ability to take ownership, aligned with business priorities and navigating in different contexts
-
Nice to have:
-
Familiarity with incident management platforms (e.g., Grafana IRM) is a bonus
-
Experience working with Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
-
Exposure to programming in Go or a strong interest in learning it.
-
Experience in integrating Opentelemtry
-
Backend services are built using multiple programming languages: while development skills aren't required, familiarity with object-oriented programming and scripting languages is an advantage.
-
Familiarity with web/mobile testing tools or a strong curiosity to understand how software is tested at scale.
-
What we have to offer
-
Hybrid status for this role : 2-3 days at the Office
-
4 additional weeks on top of legal maternity/paternity leaves
-
50% healthcare coverage (Alan)
-
Financial support for home office equipment
-
Minimum 25 days holiday per year
-
Local meal plan policy (Swile card)
-
50% transportation paid (Forfait Mobilité Durable)
-
Free unlimited carpooling & bus rides
-
Personal growth via trainings, mentorship, and internal mobility opportunities
-
Employee Stock ownership plan
-
Regular team building events
-
1 day off per year to test our product
Interested in joining the ride?
-
a 45-min video-call with Maxime, Talent Acquisition Manager, to get to know you, understand your career expectations and answer your questions
-
a 60-min video-call with Damien Bertau, Hiring Manager, to discuss your experience and share more details about the team
-
a 90-min system design interview with 2 team members to discuss about your technical expertise
-
a 45-min video-call with Maxime Fouilleul, Head of Foundations, to get a wider vision of the department and its strategy
Job details
Jobr Assistant extension
Get the extension →