
About this role
Full Time Senior Software Engineer, Architecture and Infrastructure in AI at ByteDance in San Jose, California, United States. Apply directly through the link below.
At a glance
- Work mode
- Office
- Employment
- Full Time
- Location
- San Jose, California, United States
- Salary
- 137k - 360k USD
- Experience
- Senior
Core stack
- Disaster Recovery
- Machine Learning
- Optimization
- Architecture
- Performance
- Distributed
- Efficiency
- Design
- Agile
- LLM
- ML
Quick answers
What is the salary range?
The salary range is 137k - 360k USD annually.
What skills are required?
Disaster Recovery, Machine Learning, Optimization, Architecture, Performance, Distributed, Efficiency, Design, Agile, LLM, and more.
ByteDance is hiring for this role. Visit career page
San Jose, United States
The Machine Learning (ML) System sub-team combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI
In our team, you'll have the opportunity to build the large-scale heterogeneous system integrating with GPU/NPU/RDMA/Storage and keep it running stable and reliable, enrich your expertise in coding, performance analysis and distributed system, and be involved in the decision-making process. You'll also be part of a global team with members from the United States, China and Singapore working collaboratively towards unified project direction.
Key responsibilities include:
1. Participating in online architecture design and optimization centered around deep model inference tasks, achieving high concurrency and throughput in large-scale online systems.
2. Participating in the establishment of a comprehensive system covering stability, disaster recovery, R&D efficiency, and cost, enhancing overall system stability.
3. Participating in the design and implementation of end-to-end online pipeline systems with multiple models, plugins, and storage-computation components, enabling agile, flexible, and observable continuous delivery.
4. Collaborating closely with the MLE for optimization of algorithms and systems.
5. Being proactive, optimistic, highly responsible, and demonstrating meticulous work ethic, as well as possessing strong team communication and collaboration skills.
The base salary range for this position in the selected city is $136800 - $359720 annually.
In our team, you'll have the opportunity to build the large-scale heterogeneous system integrating with GPU/NPU/RDMA/Storage and keep it running stable and reliable, enrich your expertise in coding, performance analysis and distributed system, and be involved in the decision-making process. You'll also be part of a global team with members from the United States, China and Singapore working collaboratively towards unified project direction.
Key responsibilities include:
1. Participating in online architecture design and optimization centered around deep model inference tasks, achieving high concurrency and throughput in large-scale online systems.
2. Participating in the establishment of a comprehensive system covering stability, disaster recovery, R&D efficiency, and cost, enhancing overall system stability.
3. Participating in the design and implementation of end-to-end online pipeline systems with multiple models, plugins, and storage-computation components, enabling agile, flexible, and observable continuous delivery.
4. Collaborating closely with the MLE for optimization of algorithms and systems.
5. Being proactive, optimistic, highly responsible, and demonstrating meticulous work ethic, as well as possessing strong team communication and collaboration skills.
The base salary range for this position in the selected city is $136800 - $359720 annually.
Job details
Workplace
Office
Location
San Jose, California, United States
Job type
Full Time
Experience
Senior
Salary
137k - 360k USD
per year
Company
Website
Visit siteJobr Assistant extension
Get the extension →