company logo

Site Reliability Engineer

Critical Manufacturing.com

Office

Maia, Porto District, Portugal

Full Time

Critical Manufacturing is dedicated to empowering high-performance operations to make Industry 4.0 a reality with the most innovative, comprehensive, and modular MES software. We have a global presence, but our headquarters, and the main technical center, are in Porto (Maia), Portugal, where we develop a state-of-the-art solution for Semiconductor, Electronics, Medical Devices, and Industrial Equipment. 

Recognized as a Leader by Gartner, we are part of ASMPT, the world's largest supplier of best-in-class equipment, and technological process partner for the electronics and semiconductor industries.

The Role 

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE you will be responsible for keeping an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. 
 
SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. 

Requirements

What You Will Do 

  • Analyze and interpret distributed systems telemetry (metrics, logs, traces) to identify and address potential issues before they affect users
  • Design, build, and maintain monitoring, alerting, and reliability tooling that improves system visibility and operational excellence
  • Collaborate with software and infrastructure teams to improve resilience, scalability, and performance across our platform
  • Participate in incident response and post-mortem analysis to ensure continuous learning and improvement
  • Contribute to automation efforts that reduce toil and increase engineering productivity

 
What Success Looks Like 

Within your first year, you will have: 

  • Improved reliability and observability of key production systems
  • Reduced manual operational work by automating recurring processes
  • Partnered effectively with development teams to embed SRE best practices into the software lifecycle
  • Shaped scalable approaches to telemetry, monitoring, and incident response

 
Why Join Us 

  • Be part of a company shaping the future of manufacturing software
  • Enjoy the freedom to experiment, innovate, and create systems that will last
  • Join a team where storytelling, strategy, and technology meet to make Industry 4.0 real

 

What You Will Bring 

  • More than 2 years of experience in the role of Site Reliability Engineer
  • A passion for investigation and problem-solving—digging deep until you understand how things work
  • Strong belief that telemetry is essential for system health and continuous improvement
  • Excellent English skills - spoken and written

What we consider a plus (not mandatory):

  • Experience with cloud infrastructure (e.g., Azure) or container orchestration platforms (e.g., Kubernetes, OpenShift)
  • Familiarity with Docker, Terraform, and reverse proxies (e.g., Traefik)
  • Hands-on experience designing, analyzing, and troubleshooting large-scale distributed systems
  • Ability to debug, optimize performance, and automate repetitive tasks
  • Strong problem-solving mindset

 
 


Diversity, Equity and Inclusion are a source of commitment and innovation 

At Critical Manufacturing, we welcome and encourage applications from individuals of all backgrounds, regardless of disabilities, diverse abilities, identities, or experiences. Our commitment is to create an inclusive environment where everyone has equal opportunities to succeed and thrive.  

If you need accommodation during the recruitment process, please let us know - we're happy to support you. 

Site Reliability Engineer

Office

Maia, Porto District, Portugal

Full Time

January 9, 2026

CriticalMfg