Infrastructure Site Reliability Engineer (Entry Level)- USDS
TikTok.com
Office
San Jose, California, United States
Full Time
Team Introduction
Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.
On-site presence across teams allows the company to operate with greater speed, alignment, and agility — especially in areas like real-time decision-making, team development, and integrated execution. As such, the company is shifting from a hybrid work model to a fully in-person schedule up to 5 days a week.
Responsibilities
- Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and automate
- Design and implement various dashboards and monitoring frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance
- Scale systems elastically through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Practice efficient customer support, incident response, and blameless postmortems.
Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.
On-site presence across teams allows the company to operate with greater speed, alignment, and agility — especially in areas like real-time decision-making, team development, and integrated execution. As such, the company is shifting from a hybrid work model to a fully in-person schedule up to 5 days a week.
Responsibilities
- Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and automate
- Design and implement various dashboards and monitoring frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance
- Scale systems elastically through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
- Practice efficient customer support, incident response, and blameless postmortems.
