PA2025TAFAOFFTECH3 Sr. Site Reliability Engineer (SRE), Cloud Incident Response

SS&C Technologies.com

Office

Bangkok, Thailand

Full Time

As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for expertise, scale, and technology.

Job Description

Overall Job Purpose:

Be part of a global team that ensures the performance, scalability, and reliability of critical cloud-based applications. As part of the Global Investor and Distribution Solutions (GIDS) Platform Services team, you’ll play a key role in keeping our systems running smoothly and efficiently—while helping shape the future of our platform.

What You’Ll Do:

Collaborate with global teams as part of a follow-the-sun support model.
Respond to, troubleshoot, and resolve Level 2 application incidents.
Ensure critical applications are effectively monitored using tools like Prometheus and Grafana.
Create and maintain dashboards and alerts to enhance visibility into application health.
Define, implement, and track key SRE metrics (SLOs, SLIs, error budgets).
Partner with development teams to improve application reliability and resilience.
Analyze incident trends and recommend improvements to reduce recurrence.
Automate repetitive support tasks to improve efficiency.
Participate in post-incident reviews and drive reliability initiatives.
Bachelor’s degree in Computer Science, Computer Engineering, IT, or related field.
5+ years of experience for senior roles.
Create and maintain dashboards and alerts to enhance visibility into application health.
Define, implement, and track key SRE metrics (SLOs, SLIs, error budgets).
Partner with development teams to improve application reliability and resilience.
Analyze incident trends and recommend improvements to reduce recurrence.
Automate repetitive support tasks to improve efficiency.
Participate in post-incident reviews and drive reliability initiatives.
Bachelor’s degree in Computer Science, Computer Engineering, IT, or related field.
5+ years of experience for senior roles.

Qualifications:

Minimum Qualification

Proficiency in one or more programming languages, preferably Java, JavaScript or Python.
Proven ability to troubleshoot complex systems.
Skilled in debugging, code optimization, and automation.
Experience with relational databases and data analysis.
Strong English written and spoken communication skills is a requirement of this role.
Experience working in Site Reliable Engineer (SRE) roles or incident response environments.
Hands-on experience with cloud infrastructure, preferably AWS.
Familiarity with observability tools such as Grafana, ELK Stack, or similar.
Proficiency in one or more programming languages, preferably Java, JavaScript or Python.
Proven ability to troubleshoot complex systems.
Skilled in debugging, code optimization, and automation.
Experience with relational databases and data analysis.
Strong English written and spoken communication skills is a requirement of this role.
Experience working in Site Reliable Engineer (SRE) roles or incident response environments.
Hands-on experience with cloud infrastructure, preferably AWS.
Familiarity with observability tools such as Grafana, ELK Stack, or similar.

Highly Preferred

Experience deploying and managing applications on Kubernetes platforms.
Strong skills in analyzing and troubleshooting issues in large-scale, distributed systems.
Experience with Infra automation languages and tools such as Terraform and Ansible
Strong networking fundamentals (firewalls, security groups)

Benefits:

Hybrid And International Work Environment - 6 Days In Office Per Month.
Office In The Heart Of Bangkok, Opposite To Emsphere – Easy Access From Both Bts And Mrt.
Flexible Working Hours.
Annual Leave From 12 And Up To 25 Days.
Additional Leave Types Such As Business Leave, Sick Leave, Maternity Leave, Paternity Leave, Bereavement Leave, Etc.
Flexible Time Off (Fto) In Additional To The Leave Types Above.
Group Health Insurance, Optical Claim, And Annual Health Checkup.
Provident Fund Up To 11% From Employer And 15% From Employee.
Professional Development Support.
Confidential Employee Assistance Program For Mental Health And Well-Being Support.
Welfare Committee

#Li-Nw1#Ca-Nw

Unless explicitly requested or approached by SS&C Technologies, Inc. or any of its affiliated companies, the company will not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services.

SS&C Technologies is an Equal Employment Opportunity employer and does not discriminate against any applicant for employment or employee on the basis of race, color, religious creed, gender, age, marital status, sexual orientation, national origin, disability, veteran status or any other classification protected by applicable discrimination laws.