Life at MX

We are driven by our moral imperative to advance mankind - and it all starts with our people, product and purpose. We always carry a deep sense of drive and passion with us. If you thrive in a challenging work environment, surrounded by incredible team members who will help you grow, MX is the right place for you.

Come build with us and be part of an award-winning company that’s helping create meaningful and lasting change in the financial industry.

About the Role:

We’re looking for a SRE Critical Incident Manager to lead and scale our Incident Management, Problem and Change Management functions within the broader Site Reliability Engineering (SRE) domain. This is a mission-critical role focused not just on responding to incidents, but on building the long-term systems, processes, and culture that prevent them.

As a senior technical leader, you’ll serve as an Incident Manager during major outages, lead postmortem practices, and partner across engineering to drive strategies that reduce risk and improve system resilience. At the same time, you’ll act as a process owner—owning and evolving our frameworks for incident response, change management, and operational maturity.

Responsibilities:

Incident Management & Leadership

Serve as the primary incident commander during high-severity incidents, driving cross-functional coordination, real-time decision making, and clear communication.
Own the postmortem process—ensuring accurate root cause analysis, actionable follow-ups, and knowledge sharing across teams.
Analyze incident trends to drive preventative strategies, reduce recurrence, and improve detection and response workflows.

Process Ownership & Transformation

Define, implement, and continuously evolve Incident Management best practices, including severity classification, on-call response models, and incident tooling.
Lead the development and transformation of Incident, Problem and Change Management processes, balancing velocity with safety, accountability, and auditability.
Drive alignment across engineering, security, and compliance teams on operational standards and controls.

Cross-Functional Influence & Enablement

Coach teams on operational readiness, including how to run effective retrospectives, perform risk analysis, and maintain clear SLOs.
Champion a blameless culture of learning, resilience, and transparency through playbooks, tooling improvements, and education.
Provide executive-level reporting and insights into reliability trends, change success rates, and incident learnings.

What You'll Bring:

8+ years of experience in SRE, Infrastructure, or DevOps engineering roles, with at least 2+ years in a staff-level or leadership capacity.
Proven experience as an incident commander, with a calm, structured approach to managing critical service disruptions.
Ability to troubleshoot and triage issues, identify patterns and drive resolution of issues.
Deep understanding of incident, problem and change management frameworks, and a passion for building sustainable, scalable and automated processes.
Strong technical skills in distributed systems, AWS/GCP infrastructure, observability (Datadog for APM tracing/debugging issues like latency in RabbitMQ), CI/CD.
Excellent communication and facilitation skills—able to lead under pressure and align stakeholders at all levels.
Strong experience with tools like PagerDuty, Service Now, JIRA Service Management, VictorOps etc.,

Work Environment

In this role, a significant aspect of the job involves working in the office for a standard 40-hour workweek. We believe that the collaborative nature of our work and the face-to-face interactions among team members are essential for fostering a dynamic and productive work environment. Being present in the office enables seamless communication, facilitates quick decision-making, and encourages spontaneous collaboration that contributes to the overall success of our projects. We value the synergy that comes from having our team members physically together, allowing for immediate problem-solving, idea exchange, and team building.

Compensation

The expected earnings for this role could be comprised of a base salary and other forms of cash compensation, such as bonus or commissions as applicable.

This pay range is just one component of MX’s total rewards package. MX takes a number of factors into account when determining individual starting pay, including job and level they are hired into, location, skillset, peer compensation.

**Please note applicants applying for this position must have the legal right to work in India without the need for sponsorship. We are unable to provide work sponsorship for this role, and candidates should be able to verify their eligibility to work in the country independently. Proof of eligibility to work in India will be required as part of the hiring process.

Critical Incident Manager IV

MX Technologies

About the Role:

Responsibilities:

What You'll Bring:

Critical Incident Manager IV

MX Technologies