
About this role
Be a part of something BIG!
We are seeking a highly skilled and detail-driven Cloud Engineer (Senior) to be responsible for ensuring the reliability, availability, and operational excellence of cloud platforms supporting mission-critical telco services. The role includes leading on-call incident response, driving service recovery, and improving operational resilience through automation and continuous improvement.
Make an Impact by:
Cloud Operations & Reliability
-
Operate and maintain production cloud platforms to meet telco-grade availability and performance targets.
-
Proactively identify operational risks and prevent incidents.
- Ensure operational readiness of cloud platforms for 24x7 support.
On-Call & Incident Leadership
-
Participate in and lead scheduled on-call rotations.
-
Act as incident lead for high-severity or complex cloud incidents.
-
Drive service restoration within agreed SLAs and MTTR targets.
-
Coordinate incident response across cloud, network, security, and application teams.
-
Ensure clear and timely communication to stakeholders.
Incident, Problem & Change Management
-
Perform root cause analysis and implement corrective and preventive actions.
-
Reduce incident recurrence through operational improvements.
- Review and approve standard operational changes within delegated authority.
Automation & Continuous Improvement
-
Improve on-call effectiveness through automation, self-healing, and alert optimization.
-
Enhance runbooks and operational documentation based on on-call learnings.
- Drive operational readiness for new services prior to production release.
Security, Compliance & Governance
- Ensure on-call actions comply with security, regulatory, and change controls.
- Support audits and vulnerability remediation related to cloud operations.
Mentorship & Collaboration
- Provide on-call guidance and escalation support to junior engineers.
- Share operational best practices and lessons learned across teams.
Incident Management
-
Managing major incidents impacting customer-critical telco services.
-
Balancing rapid service recovery with strict change and security controls.
- Troubleshooting complex hybrid or network-integrated cloud environments.
Decision-Making Authority
- Lead on-call incident response and recovery actions.
- Approve and implement low-risk operational changes.
- Recommend improvements to on-call processes, tooling, and architecture.
- Escalate major risks, outages, and compliance issues.
Skills for Success:
- Bachelor’s degree in IT, Computer Science, Engineering, or equivalent experience.
-
3–6 years of experience in cloud operations or infrastructure roles.
-
Strong hands-on experience with AWS, Azure, or GCP in production environments.
- Proficiency in IAC tools such as Terraform, Bicep, and CloudFormation to standardize configuration of Cloud resources.
- Proven ability to monitor, troubleshoot, and resolve complex cloud platform issues, leveraging logs metrics, and alerts across multi-cloud environments.
-
Solid understanding of cloud networking, security, and IAM.
- Experience with ITSM processes and on-call operations.
Your career growth starts here. Apply Now!