TikTok logo

Site Reliability Engineer, TikTok Generalized Arch USTO

TikTok

Posted 4 days ago

About this role

TikTok’s Generalized Architecture US Tech and Operations team is dedicated to ensuring that TikTok’s core services run stable, efficient, and cost-effective at global scale. We focus on enhancing the observability and operability of our infrastructure and services, using data-driven insights to safeguard business stability 24/7.

Responsibilities:
- Ensure the stability and reliability of TikTok’s core services; respond quickly to production incidents and build mechanisms and platforms to continuously improve incident handling efficiency.
- Define and maintain system quality SLAs through continuous, comprehensive data operations; identify and manage system risks to improve reliability, scalability, and performance.
- Participate in TikTok’s disaster recovery initiatives, including risk assessment, disaster recovery design, capacity planning, and contingency plan development, to strengthen system resilience and fault tolerance.
- Develop and accumulate best practices, tools, and frameworks for operations and maintenance; provide guidance on system architecture design and component selection; produce high-quality technical and operational documentation.

Job details

Workplace

Office

Location

San Jose, California, United States

Job type

Full Time

Similar

Company

Jobr Assistant extension

Get the extension →