company logo

Deep Learning Performance Architect - Perf Tools

NVIDIA.com

Office

China, Shanghai

Full Time

We are looking for a first-class Deep Learning Performance architect to join us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads—spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures. 

What You'Ll Be Doing:

  • Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.  
  • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. 
  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. 
  • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture 

What We Need To See:

  • BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)
  • 4+ years of software development 
  • Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program 
  • Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals 
  • Experience with performance modeling, architecture simulation, profiling, and analysis. 
  • Self-starter who thrives in dynamic environments and manages competing priorities effectively. 
  • Ways to stand out from the crowd: 
  • Experience with building performance debugging and analysis tools on silicon and simulators. Experience of developing application snapshot and replay tool is a big plus.
  • Familiar with CUDA System Software Stack(e.g., CUDA Driver/Runtime APIs), CUDA kernel optimization and understand GPU architecture 
  • Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute, NVTX, etc, or experience for developing similar tools for other processors. 
  • Practical experience or projects demonstrating AI/ML-based code generation, automated data analysis, or workflow assistants. 

Deep Learning Performance Architect - Perf Tools

Office

China, Shanghai

Full Time

September 18, 2025

company logo

NVIDIA

nvidia