
About this role
TikTok’s Generalized Architecture US Tech and Operations team is dedicated to ensuring that TikTok’s core services run stable, efficient, and cost-effective at global scale. We focus on enhancing the observability and operability of our infrastructure and services, using data-driven insights to safeguard business stability 24/7.
Responsibilities:
- Ensure the stability and reliability of TikTok’s core services; respond quickly to production incidents and build mechanisms and platforms to continuously improve incident handling efficiency.
- Define and maintain system quality SLAs through continuous, comprehensive data operations; identify and manage system risks to improve reliability, scalability, and performance.
- Participate in TikTok’s disaster recovery initiatives, including risk assessment, disaster recovery design, capacity planning, and contingency plan development, to strengthen system resilience and fault tolerance.
- Develop and accumulate best practices, tools, and frameworks for operations and maintenance; provide guidance on system architecture design and component selection; produce high-quality technical and operational documentation.
Responsibilities:
- Ensure the stability and reliability of TikTok’s core services; respond quickly to production incidents and build mechanisms and platforms to continuously improve incident handling efficiency.
- Define and maintain system quality SLAs through continuous, comprehensive data operations; identify and manage system risks to improve reliability, scalability, and performance.
- Participate in TikTok’s disaster recovery initiatives, including risk assessment, disaster recovery design, capacity planning, and contingency plan development, to strengthen system resilience and fault tolerance.
- Develop and accumulate best practices, tools, and frameworks for operations and maintenance; provide guidance on system architecture design and component selection; produce high-quality technical and operational documentation.