Senior Site Reliability Engineer, Trust & Safety - USDS
TikTok.com
Office
Seattle, Washington, United States
Full Time
Team Intro
The USDS TikTok Product Engineering SRE team works with engineering and product teams to build, maintain and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities:
- Manage day-to-day operations of data service, realtime/batch data pipelines, such as SLA management, system deployment, performance tuning and trouble shooting
- Create tools and automation to improve system administration and operation efficiency
- Participate in regular on-call duties
- Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and refinement
- Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Practice sustainable user support, incident response, and postmortems
The USDS TikTok Product Engineering SRE team works with engineering and product teams to build, maintain and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities:
- Manage day-to-day operations of data service, realtime/batch data pipelines, such as SLA management, system deployment, performance tuning and trouble shooting
- Create tools and automation to improve system administration and operation efficiency
- Participate in regular on-call duties
- Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and refinement
- Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Practice sustainable user support, incident response, and postmortems
Senior Site Reliability Engineer, Trust & Safety - USDS
Office
Seattle, Washington, United States
Full Time
October 6, 2025