TikTok logo

Senior Site Reliability Engineer, Gaming - USDS

TikTok

Posted 2 days ago

About this role

We are the Gaming SRE team, responsible for the stability and performance of our global gaming infrastructure. Our mission is to ensure 24/7 seamless gameplay for millions of users through automation, robust monitoring, and rapid incident response. We bridge the gap between engineering and operations to build a reliable and scalable gaming ecosystem.

We are looking for a Site Reliability Engineer who is passionate about building resilient systems that power seamless multiplayer experiences. As an SRE at USDS, you will bridge the gap between game development and infrastructure. You will be responsible for the health, performance, and scalability of our global game servers and backend services, ensuring that players around the world have a lag-free experience 24/7.

Responsibilities
- Scalability & Performance: Design and implement auto-scaling solutions for game dedicated servers (DGS) to handle massive spikes during game launches and seasonal events.
- Infrastructure as Code (IaC): Manage and provision multi-region cloud infrastructure using tools like Terraform or Pulumi.
- Monitoring & Observability: Build robust monitoring dashboards and alerting systems to detect "micro-stutters" or latency issues before they impact the player base.
- Incident Response: Participate in an on-call rotation to troubleshoot and resolve high-priority production issues, followed by blameless post-mortems.
- Cost Optimization: Balance high-performance hardware requirements (like high-clock speed CPUs for game sims) with cloud cost-efficiency.
- CI/CD Pipelines: Streamline the deployment of game builds and backend microservices to ensure rapid, safe releases.

Job details

Workplace

Office

Location

San Jose, California, United States

Job type

Full Time

Similar

Company

Jobr Assistant extension

Get the extension →