company logo

Site Reliability Engineer - AML Global Recommendation - USDS

TikTok.com

Office

New York, New York, United States

Full Time

About the Team:
Site Reliability Engineering (SRE) of the AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run a massively distributed AI/ML recommendation system for the United States and all around the world.

On the SRE team, you'll have the opportunity to sharpen your expertise in coding, performance analysis, and large-scale systems operation. Join us and you'll have the chance to shape the future of AML systems and make a real, tangible impact on TikTok users.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Responsibilities:
- Design, build, and maintain highly available, scalable, and fault-tolerant systems.
- Monitor and analyze system performance, identifying and resolving issues before causing user impact.
- Develop and maintain automated monitoring, alerting, and incident response systems.
- Collaborate closely with software engineering teams to ensure that applications are designed with reliability, scalability, and performance in mind.
- Implement and maintain security best practices and ensure compliance with regulatory requirements.
- Participate in on-call rotations and respond to issues and incidents within and outside of normal business hours.
- Conduct root cause analysis of incidents, hold post-mortem reviews with stakeholders, and implement preventative measures to minimize the risk of similar incidents occurring in the future.

Site Reliability Engineer - AML Global Recommendation - USDS

Office

New York, New York, United States

Full Time

September 18, 2025

company logo

TikTok