Technical Summary
You will support the reliability and scalability of services across AWS, Azure, GCP, and Oracle by executing automation, CI/CD, observability, and container orchestration tasks. You will work closely with senior engineers to ensure production systems are stable, well-monitored, and continuously improving.

Responsibilities

- Implement and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, ELK, OpenTelemetry)
- Build and maintain CI/CD pipelines and automation for deployments and testing
- Support containerized workloads using Docker and Kubernetes; manage Helm charts and deployments
- Contribute to incident response, troubleshooting, and postmortem documentation
- Implement IaC patterns (Terraform, CloudFormation, ARM templates) under guidance
- Collaborate with developers to improve service reliability and operational readiness
- Participate in continuous platform improvements led by senior/principal engineers

Must-have Qualifications

- 3–5 years of experience in operations, DevOps, or SRE roles
- Hands-on experience with containers and orchestration (Docker, Kubernetes)
- Familiarity with IaC tools (Terraform, Ansible, or similar)
- Experience with CI/CD tools (Jenkins, GitHub Actions, ArgoCD, or similar)
- Proficiency in at least one scripting language (Python, Bash, Go)
- Associate Level Cloud Certification (AWS, Azure, GCP, Oracle, Cloud+)
- This position requires availability for weekend and holiday shifts as part of the standard scheduling rotation

Nice-to-have Skills

- Exposure to SLOs/SLIs and error budgets
- Familiarity with chaos testing or service mesh

Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leading technologies. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes, and is an AWS Advanced Tier Consulting Partner, a Microsoft Solutions Partner for Data & AI and Digital & App Innovation (Azure), an Oracle Partner, and a MySQL Partner.

Site Reliability Engineer

About this role

Job details

Company