Senior Site Reliability Engineer, Trust & Safety - USDS

TikTok.com

Office

Seattle, Washington, United States

Full Time

Team Intro
The USDS TikTok Product Engineering SRE team works with engineering and product teams to build, maintain and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Responsibilities:
- Manage day-to-day operations of data service, realtime/batch data pipelines, such as SLA management, system deployment, performance tuning and trouble shooting
- Create tools and automation to improve system administration and operation efficiency
- Participate in regular on-call duties
- Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and refinement
- Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Practice sustainable user support, incident response, and postmortems