Manager, System Software - Triton Inference Server

NVIDIA.com

224k - 357k USD/year

Office

US, CA, Santa Clara, United States

Full Time

NVIDIA is searching for a passionate Software Engineering Manager to lead the Triton Inference Server team. Triton is cutting-edge, open source inference software that powers AI deployment across cloud, data center, edge, and embedded devices—supporting models from TensorRT, TensorFlow, PyTorch, ONNX, and more. Join us to shape the future of scalable, production-ready AI solutions used by innovators around the globe.

What You Will Be Doing:

Guide, mentor, and develop an inclusive and collaborative engineering team focused on delivering robust model serving solutions.
Drive planning, prioritization, and execution for projects that improve Triton’s scalability, performance, and reliability in non-generative AI deployments.
Foster partnerships with Product and Program Management to create feature roadmaps, manage cross-team dependencies, and balance project resources for both cloud and on-premises platforms.
Collaborate with internal collaborators and external customers to understand use cases and convert their needs into product features.
Promote engineering excellence through modern, agile development practices and a culture of quality and accountability.

What We Need To See:

Master’s or PhD, or equivalent experience, in Computer Science, Computer Engineering, or a related field.
Eight or more years of overall hands-on software development experience in customer-facing environments.
At least three years building, mentoring, and leading software engineering teams delivering production-grade solutions.
Deep background in scalable serving architectures, with direct experience building cloud-native inference APIs, REST/gRPC/protobuf-based services, or similar technologies.
Advanced C/C++ and Python development skills, demonstrating clean, object-oriented design, as well as proficiency in debugging, performance optimization, and testing.
Track record of contributing to or leading large open-source projects—using GitHub for code reviews, bug tracking, and release management.
Strong knowledge of agile methodologies and tools such as JIRA and Linear.
Ability to communicate technical topics with clarity and empathy to colleagues, partners, and diverse audiences.

Ways to stand out from the crowd:

Experience working within distributed, global teams.
Practical knowledge of machine learning model deployment with frameworks such as TensorRT, TRT-LLM, PyTorch, ONNX, Python, or similar platforms.
Understanding of CPU and GPU architectures.
Skills in GPU programming (for example, CUDA or OpenCL).

NVIDIA sets industry standards for innovation, collaboration, and workplace empowerment. Team members are creative, driven, and dedicated to building responsible, real-time solutions that power AI worldwide. If leading scalable AI serving software excites you, thrive in a flexible and inclusive work environment with opportunities for growth and impact.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until September 23, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.