vCluster Labs logo

Engineering Tech Lead (AI Infrastructure)

vCluster Labs

Posted about 3 hours ago

As an Engineering Tech Lead (AI Infrastructure) at vCluster Labs, you aren't just a data center engineer; you are the person who builds the foundation our entire AI infrastructure product story runs on. You'll stand up our HGX-based GPU lab from the ground up, racking, cabling, and commissioning NVIDIA HGX systems, and turn that hands-on expertise into reference architectures, product guidance, and institutional knowledge the whole team can build on.

As an Engineering Tech Lead, your role will include:

  • GPU Lab Build-Out: Own the end-to-end setup of vCluster Labs' HGX-based test and lab environment. You'll advise on equipment procurement, select data center colocation, specify cabling and switch configurations, and physically commission two HGX systems (eight GPUs each). This is hands-in-the-data-center work, and you'll own it start to finish.

  • Infrastructure Architecture: Make the critical decisions on network topology, switch selection, and hardware configuration for our HGX infrastructure. You'll know what questions to ask, what to order, and how to get these systems running at full performance.

  • Reference Architecture Development: Translate your hands-on HGX expertise into documented reference architectures that AI Cloud operators and enterprise AI factories can use to deploy their own infrastructure alongside vCluster Platform.

  • Product Influence: Work closely with engineering and product teams to inform design decisions. Your field experience with how GPU infrastructure actually gets built and operated is exactly what we need to ensure our software meets real production requirements.

  • Knowledge Transfer: Teach us what you know. vCluster Labs is building deep expertise in GPU infrastructure, and you'll be the primary source of that knowledge internally, through documentation, hands-on sessions, and day-to-day collaboration.

This role could be a fit for you if you bring:

  • HGX Systems Experience: You have physically provisioned NVIDIA HGX systems. Not just managed workloads on top of them. You were in the data center racking, cabling, and commissioning them.

  • Data Center Hands-On Expertise: You know how to specify cabling, connect servers to top-of-rack switches, select colocation facilities, and make infrastructure procurement decisions. This is deeply practical knowledge you've used in production.

  • Networking Depth: You understand the networking requirements that come with dense GPU infrastructure (InfiniBand, high-speed Ethernet, switch configuration) at the level of someone who has actually done it.

  • AI Cloud or HPC Background: You've worked inside an AI Cloud, HPC facility, or enterprise AI factory where you owned infrastructure at the hardware level.

  • Knowledge Sharer: You're energized by teaching others. You document what you know, explain your reasoning, and want the people around you to be able to do what you do.

Bonus points for:

  • Software Depth: Familiarity with Kubernetes, GPU scheduling (MIG, time-slicing), or the software stack that runs on top of HGX systems.

  • Customer-Facing Experience: Experience building or presenting reference architectures directly to enterprise customers.

  • NVIDIA Ecosystem Relationships: Existing connections at NVIDIA or within the AI infrastructure community.

About vCluster Labs

We are a venture-backed tech startup and the company pioneering Kubernetes virtualization for the AI era. We raised +$30M from top-tier VCs such as Khosla Ventures (first investor in OpenAI, GitLab, Stripe, Doordash) and are in a hyper-growth phase looking for motivated people to complement our team. Our headquarters are in San Francisco (Salesforce Tower), but our team is distributed around the globe and we have a remote-first work culture.

We are the leading platform for operating GPU infrastructure, enabling AI Cloud providers to deliver a hyperscaler-like experience to their customers and AI factories that need to build that same experience for their internal teams. Our platform delivers the full operational stack operators need to run their GPU data centers — managed Kubernetes, fast isolated tenant provisioning, and automated node provisioning and lifecycle management — enabling them to accelerate time to value, reduce operational burden, and maximize the ROI of every GPU.

We're the company behind vCluster, an open-source technology for virtualizing Kubernetes (10k+ GitHub stars, 40M+ virtual clusters created since 2021). Open source is part of our DNA. At KubeCon North America 2025, we launched our Infrastructure Tenancy Platform for AI — a Kubernetes-native framework purpose-built for running AI, ML, and GPU-intensive workloads anywhere, with an NVIDIA-validated reference architecture for DGX systems.

Benefits

We offer the following benefits:

  • Competitive Salary: We offer a competitive compensation package, including equity.

  • Platinum-Level Insurance: Health, dental, vision, and life Insurance, including plans for you and eligible dependents (benefits vary depending on country).

  • Flexible Working Schedule:  You have a doctor’s appointment or need to head to the supermarket to get groceries at 2pm? We won’t have an issue with that. To us, results matter more than clocking in and out at the same time every day.

  • Workplace Flexibility:  We’re very flexible about where you work. We know things can change in life and we’re happy to adjust the work environment for you along the way.

Culture & Values

At vCluster Labs, we value and stand for:

  1. Make it Happen: We have a relentless bias for action and the grit to push through obstacles. We do whatever it takes to figure it out, put in the work, and ruthlessly prioritize the actions that drive measurable impact for the business.

  2. Own the Outcome: We understand that our responsibility doesn't end when a task is checked off; it ends when the value is delivered. We connect our daily individual actions to the broader success of the company and our customers.

  3. Create Wow: We measure success by the experience we generate, both inside and outside the company. For our customers, this means impressive speed and intuitive experiences. For our team, this means going the extra mile to support one another and to continuously drive each other to new heights.

  4. Open Source, Open Mind: We are actively contributing to and maintaining open-source projects. Internally, we foster meritocracy — the strongest ideas win, no matter who or where they come from.

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Hybrid

Location

Germany

Experience

SE

Salary

180k - 220k USD

per year

Similar

Jobr Assistant extension

Get the extension →