Jolera logo

Senior Site Reliability Engineer

Posted 8 days ago

OfficeColombo, Western Province, Sri LankaSE

Job Purpose

Lead for a team of site reliability engineers delivering who deliver incident detection, triage, and runbook-based remediation for production cloud-native environments, to support our North American customers. Set the operational standard for triage and recovery, act as the senior escalation point, and serve as the primary technical liaison to the Service Delivery Manager.

Key Responsibilities

• Lead incident detection, triage, and first response across production cloud and Kubernetes environments, to support our North American customers.

• Execute and oversee approved runbooks for service restoration — workload and node restarts, scaling, rollbacks, and database stabilization — within agreed operational boundaries.

• Act as the senior escalation authority; prepare clear escalation summaries covering impact, actions taken, current state, and recommended next steps.

• Author, review, and maintain operational runbooks; continuously improve detection, alerting, and automation.

• Engage cloud-provider support (AWS, GCP) for platform-level failures and vendor escalations.

• Technically supervise and mentor the SRE team; review handoffs and assure consistency across shifts.

• Own daily shift handoffs and contribute to monthly service reporting and reviews.

People Management

• Provides technical leadership and day-to-day supervision

• Contributes to coaching, performance input, and skills development; formal line management sits with the Service Delivery Manager.

Financial Responsibility

• Accountable for protecting service levels and cost-to-serve through efficient, automation-first operations.

• Key Performance Indicators (KPIs)

• Service-level (SLO/SLA) attainment

• Mean time to acknowledge / mean time to resolve

• Runbook coverage and quality

• Escalation accuracy and completeness

• Shift-handoff quality and reporting timeliness

• Repeat-incident reduction and automation adoption

Requirements

Education & Certifications

• Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

• Certified Kubernetes Administrator (CKA) required; CKAD, AWS, and Google Cloud certifications strongly preferred.

Experience

• 7+ years in SRE, DevOps, or production infrastructure operations, including 3+ years operating Kubernetes in production.

• Proven track record leading incident response for production cloud workloads.

• Managed-services / MSP or 24×7 operations experience preferred.

Skills & Competencies

Technical Skills

• Kubernetes operations across AWS EKS and GCP GKE

• AWS and GCP core services (compute, storage, networking, scaling, IAM)

• Relational database operational recovery (e.g., PostgreSQL)

• Observability platforms (e.g., Datadog)

• Scripting and automation (Bash, Python, Go or equivalent); read-level Terraform/IaC

• Incident command and structured troubleshooting

Soft Skills

• Calm, decisive incident leadership under pressure

• Clear written and verbal English

• Mentoring and team collaboration

• Time management

Tools / Software

• Datadog

• Jira / ServiceNow

• Confluence / GitHub Wiki

• AWS & GCP consoles

• Slack / Microsoft Teams

Benefits

What We Offer

  • Competitive compensation package
  • Competitive benefits package
  • Company Perks, Good Life gym, and various brand discounts
  • Company events, recognitions, and celebrations
  • Career development and growth opportunities
Job details
Workplace
Office
Location
Colombo, Western Province, Sri Lanka
Experience
SE

Delivering scalable managed IT and cybersecurity solutions for businesses and IT partners around the world.

Employees
405
Industry
IT Services and IT Consulting
Headquarters
Toronto, ON
Founded
2001
Company location
365 Bloor Street East, Second Floor, Toronto, ON M4W 3L4, CA
Specialties
Managed IT Services, Helpdesk Services, Professional Services, Development Services, Procurement Services, Data Protection, Cloud-Based Solutions, Technical Support, Hardware Implementation, IT Infrastructure Consulting, IT Development, Cybersecurity, AI Business, Global Services Integrator, and AI Services

Key team members

Richard Guise

Richard Guise

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups