company logo

Sr Technical Project Manager (AI Token Factory)

Nebius.com

Office

Amsterdam, Netherlands

Full Time

Why work at Nebius
Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.

Where we work
Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 800 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.

About The Team

The TokenFactory team is building a high-performance inference platform for large-scale, production use of LLMs. Our mission is to make powerful models easy to consume via stable APIs while meeting strict requirements on latency, reliability, and cost efficiency.
The platform runs GPU-intensive workloads at scale and integrates deeply with the Nebius Cloud infrastructure, networking, observability, and capacity planning systems.

We are looking for a Technical Program Manager (TPM) to drive cross-team execution for the inference platform as it scales in usage, regions, and complexity.

What You’Ll Do

As a TPM for the AI Studio inference platform, you will own end-to-end delivery of complex, cross-functional initiatives that span infrastructure, platform engineering, hardware, and customer-facing teams.

You Will:

  • Drive large, cross-team programs related to platform scaling, reliability, performance, and cost efficiency
  • Coordinate work across AI Studio engineers, Cloud Platform and Observability teams
  • Translate product and customer requirements (latency, throughput, SLAs, cost) into executable technical plans
  • Define clear scope, milestones, dependencies, and success metrics for multi-quarter initiatives
  • Unblock teams by driving decisions on architecture trade-offs, rollout strategies, and operational processes
  • Track and communicate risks, incidents, and dependencies to stakeholders at both engineering and leadership levels
  • Introduce and scale repeatable processes for launches, capacity planning, incident reviews, and platform changes
  • Support execution around model rollouts, autoscaling changes, GPU capacity expansion, and regional launches
  • Drive large, cross-team programs related to platform scaling, reliability, performance, and cost efficiency
  • Coordinate work across AI Studio engineers, Cloud Platform and Observability teams
  • Translate product and customer requirements (latency, throughput, SLAs, cost) into executable technical plans
  • Define clear scope, milestones, dependencies, and success metrics for multi-quarter initiatives
  • Unblock teams by driving decisions on architecture trade-offs, rollout strategies, and operational processes
  • Track and communicate risks, incidents, and dependencies to stakeholders at both engineering and leadership levels
  • Introduce and scale repeatable processes for launches, capacity planning, incident reviews, and platform changes
  • Support execution around model rollouts, autoscaling changes, GPU capacity expansion, and regional launches

What We Expect

  • 5+ years of experience as a TPM (or equivalent role) leading cross-team technical programs
  • Strong technical foundation in cloud platforms, distributed systems, and production infrastructure
  • Practical understanding of Kubernetes-based platforms, service reliability, and observability (metrics, logs, traces)
  • Experience driving execution where you influence without formal authority
  • Ability to reason about system-level trade-offs (latency vs cost, reliability vs utilization)
  • Strong written and verbal communication skills; comfortable working with engineers and senior stakeholders
  • Analytical mindset with hands-on experience using data (SQL, Python, or scripting) to track progress and inform decisions
  • 5+ years of experience as a TPM (or equivalent role) leading cross-team technical programs
  • Strong technical foundation in cloud platforms, distributed systems, and production infrastructure
  • Practical understanding of Kubernetes-based platforms, service reliability, and observability (metrics, logs, traces)
  • Experience driving execution where you influence without formal authority
  • Ability to reason about system-level trade-offs (latency vs cost, reliability vs utilization)
  • Strong written and verbal communication skills; comfortable working with engineers and senior stakeholders

Nice To Have / Ways To Stand Out

  • Prior background as a Software Engineer, SRE, or Systems Engineer
  • Experience working with GPU-based workloads or high-throughput inference systems
  • Familiarity with LLM serving stacks (e.g. vLLM, TRTLLM) or ML platform environments
  • Experience running programs tied to capacity planning, autoscaling, or multi-region deployments
  • Exposure to environments operating under strict SLOs / SLAs and fast incident response loops
  • Prior background as a Software Engineer, SRE, or Systems Engineer
  • Experience working with GPU-based workloads or high-throughput inference systems
  • Familiarity with LLM serving stacks (e.g. vLLM, TRTLLM) or ML platform environments
  • Experience running programs tied to capacity planning, autoscaling, or multi-region deployments
  • Exposure to environments operating under strict SLOs / SLAs and fast incident response loops

What We Offer

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.

We’re growing and expanding our products every day. If you’re up to the challenge and are excited about AI and ML as much as we are, join us!

Sr Technical Project Manager (AI Token Factory)

Office

Amsterdam, Netherlands

Full Time

December 16, 2025

nebiusai