Senior Director, Cloud Operations (SRE, SDM)
Granicus
Office
Bengaluru
Full Time
The Senior Director of Cloud Operations is responsible for the operational integrity, performance, and reliability of enterprise cloud environments. This role leads a global, data-driven operations team with a strong emphasis on incident management, service continuity, and continuous improvement. This role reports directly to the Vice President of Cloud.
This position will be responsible for leading a global team of cloud engineers, SRE practice, service management tools and operations using a metrics-first approach.
This position will be responsible for leading a global team of cloud engineers, SRE practice, service management tools and operations using a metrics-first approach.
What your impact will look like here:
- Cloud Infrastructure Operations
- Oversee the daily operations of cloud platforms (AWS, Azure, GCP), ensuring high availability and performance across global regions.
- Lead the development and execution of operational runbooks, SOPs, and escalation paths.
- Incident Management & Response
- Own the end-to-end incident management lifecycle: detection, triage, escalation, resolution, and post-incident review.
- Lead a global incident response team with 24/7 coverage, ensuring seamless handoffs across time zones.
- Implement real-time monitoring, alerting, and automated remediation to reduce MTTD and MTTR.
- Use data analytics to identify incident trends, recurring issues, and systemic risks.
- Conduct blameless postmortems and ensure corrective actions are prioritized and tracked to closure.
- Data-Driven Operational Leadership
- Build and lead a global team of cloud engineers, SREs, and operations analysts using a metrics-first approach.
- Define and track operational KPIs (e.g., uptime, incident frequency, resolution time, change success rate) to drive accountability and performance.
- Leverage dashboards and analytics platforms (e.g., Datadog, Grafana, Splunk, ServiceNow) to provide real-time visibility into system health and team performance.
- Use data to inform staffing models, on-call rotations, and workload balancing across regions.
- Foster a culture of continuous improvement through data-backed retrospectives and operational reviews.
- AI enabled Focus
- Drive AI and ML adoption in operational workflows (e.g., predictive monitoring, incident pattern analysis etc.,) to improve uptime and automate repetitive tasks.
- Define and execute AI-driven observability strategy using tools like AIOps platforms for intelligent alerting and root cause analysis.
- Collaborate with Engineering, Security, and Product teams to embed AI-enabled automation in deployment pipelines, change management etc.,.
- Establish and maintain SLOs/SLAs leveraging AI-generated insights to prioritize engineering work that improves reliability and customer experience.
- Oversee incident management, post-mortems, and continuous improvement, incorporating AI tools for impact analysis and knowledge retention.
- Operational Governance
- Define and enforce SLAs, SLOs, and operational KPIs.
- Ensure compliance with security, regulatory, and audit requirements.
- Manage change control, configuration management, and release processes to minimize operational risk.
- Cost & Vendor Management
- Monitor and optimize cloud spend through cost governance and usage analysis.
- Manage vendor relationships, contracts, and service-level agreements.
- Collaboration & Communication
- Partner with engineering, security, and business teams to align operations with product and service goals.
- Provide regular reporting and updates to executive leadership on operational health, risks, and incident trends.
- Education
- Bachelor’s or master’s degree in computer science, Information Systems, or related field.
- Experience
- 14+ years in IT operations, with 7+ years in cloud infrastructure and operations leadership.
- Proven experience leading global teams and managing high-severity incidents in large-scale environments.
- Skills
- Deep expertise in cloud operations, incident response, and service reliability.
- Strong knowledge of ITIL, SRE, and DevOps practices.
- Proficiency in operational analytics and observability tools.
- Excellent leadership, communication, and cross-functional collaboration skills.
- Strong presentation skills, including experience presenting to large global audiences.
- Certifications (Preferred)
- AWS Certified DevOps Engineer – Professional
- Azure Administrator Associate
- ITIL Foundation or Practitioner
Senior Director, Cloud Operations (SRE, SDM)
Office
Bengaluru
Full Time
August 4, 2025