company logo

Site Reliability Engineer - Fixed Term Contract

MMT Digital.com

Office

Barcelona, Catalonia, Spain

Temporary

The Role

We are seeking an experienced Site Reliability Engineer to play a pivotal role in bridging the gap between software engineering and operations. This role emphasizes designing robust solutions, mentoring teams, and driving performance improvements for both internal and client systems through expertise in automation, scalability, and system reliability. 

As a Site Reliability Engineer, you will be responsible for owning the uptime and performance of critical infrastructure and applications while working closely with clients to align reliability goals with their business objectives. 

Key Responsibilities

System Reliability & Performance 

  • Own the uptime and performance of critical infrastructure and applications 
  • Design scalable, fault-tolerant architectures that meet business needs while maximizing operational efficiency 
  • Define and govern Non-Functional Requirements (NFRs) such as availability, performance, and maintainability for internal and client systems 
  • Proactively identify opportunities for system optimization, scalability, and cost reduction 

Automation & Infrastructure

  • Design and implement automation for monitoring, incident response, and repetitive operational tasks 
  • Implement Infrastructure as Code (IaC) practices with tools like Terraform, ARM templates and CloudFormation for consistent cloud environment provisioning 
  • Design and deploy containerized solutions using Docker and Kubernetes on cloud platforms 
  • Set up and manage CI/CD pipelines and frameworks tailored for cloud-native applications using GitHub Actions and Azure DevOps 

Cloud Infrastructure & Operations 

  • Participate in architectural decisions and implement cloud infrastructure solutions using Azure and AWS services, ensuring high availability and scalability 
  • Manage and optimize cloud resources to improve performance, cost efficiency, and security 
  • Apply cloud-native best practices to secure and govern cloud environments, ensuring compliance with industry standards 
  • Integrate advanced monitoring and alerting tools (e.g., Datadog, CloudWatch, Application Insights) to maintain system observability in multi-cloud environments 

Incident Management & Analysis 

  • Lead incident response, conduct root cause analyses, and produce blameless postmortems to prevent future occurrences 
  • Collaborate with development teams to integrate observability and performance metrics into the development lifecycle 
  • Build and maintain executive-level and developer-centric dashboards to visualize key metrics 

Technical Expertise

  • Proven experience in running and maintaining production systems with expertise in triaging and solving incidents 
  • Proficiency in automation and configuration management tools (e.g., Terraform, Ansible) 
  • Expertise in cloud platforms, particularly Azure and AWS, and their associated tools 
  • Strong programming skills, with a primary focus on Python, for developing automation scripts, creating custom tooling, and optimizing operational workflows 
  • Experience with modern observability platforms such as Datadog 

Skills & Experience

  • A solid foundation in system architecture, with a focus on scalability and reliability 
  • Exceptional problem-solving skills and a data-driven mindset 

Desirable Requirements

  • Experience with container orchestration tools such as Kubernetes 
  • Familiarity with CI/CD pipelines and tools like GitHub Actions and Azure DevOps 
  • Knowledge of security best practices in cloud and hybrid environments 

Site Reliability Engineer - Fixed Term Contract

Office

Barcelona, Catalonia, Spain

Temporary

September 18, 2025

company logo

MMT Digital

MMT_Digital