TEKEVER logo

Site Reliability Engineer

TEKEVER

Posted 9 days ago

About this role

Are you ready to revolutionise the world with TEKEVER? 🚀🌍

At TEKEVER, we lead innovation in Europe as the European leader in unmanned technology, where cutting-edge advancements meet unparalleled innovation.

💻 Digital | 🛡️ Defence | 🔒 Security | 🛰️ Space

We operate across four strategic areas, combining artificial intelligence, systems engineering, data science, and aerospace technology to tackle global challenges — from protecting people and critical infrastructure to exploring space.

We offer a unique surveillance-as-a-service solution that delivers real-time intelligence, enhancing maritime safety and saving lives. Our products and services support strategic and operational decisions in the most demanding environments — whether at sea, on land, in space, or in cyberspace.

🌐 Become part of a dynamic, multidisciplinary, and mission-driven team that is transforming maritime surveillance and redefining global safety standards.

At TEKEVER, our mission is to provide limitless support through mission-oriented game-changers, delivering the right information at the right time to empower critical decision-making.

If you're passionate about technology and eager to shape the future — TEKEVER is the place for you. 👇🏻🎯

Mission:

As a Site Reliability Engineer (SRE), you will be a key player in ensuring our production systems are highly available, scalable, and performant. You will bridge the gap between development and operations, applying a software engineering mindset to system administration topics. You'll be responsible for building and maintaining large-scale, fault-tolerant distributed systems, with a strong focus on automation, operational excellence, and reliability under real-time, high-throughput constraints. The ideal candidate has a strong background in software engineering and systems administration, with a passion for solving operational problems with code. 

What will be your responsibilities:

  • System Reliability & Availability: Design, build, and maintain highly available, scalable infrastructure for distributed and stateful workloads, supporting real-time data ingestion, AI inference pipelines, and hybrid cloud/edge deployment. 

  • Automation & Toil Reduction: Automate repetitive manual tasks, infrastructure provisioning, and operational workflows to reduce toil and improve system efficiency. 

  • Monitoring, & Alerting: Implement and manage robust monitoring, logging, and alerting solutions to proactively detect and address issues. Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs). 

  • Incident Response & Management: Participate in an on-call rotation to respond to production incidents. Lead blameless post-mortem analyses for incidents in complex distributed systems, identifying root causes, systemic weaknesses, and implementing long-term preventative measures. 

  • Infrastructure as Code (IaC): Manage and provision cloud and on-premise infrastructure using IaC principles and tools like Terraform and Ansible. 

  • Performance & Capacity Planning: Conduct performance analysis, system tuning, and capacity planning to ensure our services meet performance and cost-efficiency goals. 

  • Disaster Recovery: Develop, test, and maintain disaster recovery plans and business continuity strategies to ensure service resilience. 

  • Collaboration: Work closely with software development teams to consult on system design, platform choices, and reliability best practices for new features and services. 

  • Documentation: Create and maintain comprehensive documentation for system architecture, runbooks, and operational procedures. 

Profile and requirements:

  • Education: Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field. 

  • Experience: 3+ years of experience in Site Reliability Engineering, DevOps, or a related software/systems engineering role. 

  • Technical Skills: 

  • Proficiency in one or more programming languages such as Python, Go, or Bash for automation and tooling. 

  • Deep understanding of Linux/Unix operating systems and networking fundamentals (TCP/IP, DNS, HTTP, load balancing). 

  • Experience with cloud platforms such as AWS, Azure, or Google Cloud, with a focus on Google Cloud. 

  • Strong knowledge of CI/CD tools like Jenkins, GitLab CI, or CircleCI. 

  • Strong hands-on experience operating Kubernetes in production, including troubleshooting of networking, storage, scheduling, autoscaling, and stateful workloads. 

  • Experience with Infrastructure as Code (IaC) tools such as Terraform and Ansible. 

  • Understanding of version control systems (e.g., Git) and with CI/CD principles and tools (e.g., GitLab CI, Jenkins). 

  • Knowledge of monitoring, logging and tracing tools (e.g., Prometheus, Grafana, ELK stack). 

  • Analytical Skills: Strong analytical and problem-solving skills, with an ability to diagnose and resolve complex issues in distributed systems. 

  • Communication: Excellent verbal and written communication skills, with the ability to effectively collaborate with technical and non-technical stakeholders. 

  • Attention to Detail: High attention to detail and a commitment to ensuring the accuracy and quality of work. 

  • Adaptability: Ability to thrive in a fast-paced, dynamic environment and manage multiple projects simultaneously. 

What we have to offer you:

  • An excellent work environment and an opportunity to create a real impact in the world; 

  • A truly high-tech, state-of-the-art engineering company with flat structure and no politics; 

  • Working with the very latest technologies in Data & AI, including Edge AI, Swarming - both within our software platforms and within our embedded on-board systems; 

  • Flexible work arrangements; 

  • Professional development opportunities; 

  • Collaborative and inclusive work environment; 

  • Salary compatible with the level of proven experience. 

Do you want to know more about us ?

Visit our LinkedIn page at https://www.linkedin.com/company/tekever/

Job details

Workplace

Office

Location

Lisboa, Portugal

Job type

Full Time

Similar

Company

Jobr Assistant extension

Get the extension →