
Founding ML infrastructure Engineer
uRun
Posted 1 day ago
The problem we saw
Most AI infrastructure is built for batch: send a query, wait, get a response, reset. Powerful, but transactional. AI is becoming interactive — sessions that hold state, models that stay alive between turns, generation that responds as it runs — and the infrastructure to deliver that at scale doesn't really exist yet.
The bottleneck isn't the models anymore. It's the infrastructure underneath them.
What we're building to fix it
uRun is the inference cloud for interactive AI: the compute layer that makes real-time, stateful inference possible at scale. We came out of stealth in April 2026, are backed by top-tier investors, and are founded by Keegan McCallum, who scaled inference infrastructure for some of the most demanding generative AI workloads in production.
We're an infrastructure company. We build the layer that model labs, builders, and research teams ship on top of.
Where you come in
We are building the next generation of AI inference infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute platform from the ground up.
This is a founding technical hire with end-to-end ownership across the full infrastructure stack, from bare metal to model serving. You will work directly with the founding team and define how we build.
What you'll actually be doing day-to-day
Design and scale our GPU compute platform to support 1,000+ GPU clusters, ensuring high availability and low-latency inference across the fleet
Build and maintain the infrastructure layer for our compute marketplace, including multi-tenant scheduling, isolation, and billing-aware resource allocation
Own production reliability for ML systems end-to-end: observability, incident response, and SLA achievement across model serving and infrastructure
Architect feature stores and model registry systems that support rapid iteration and reproducibility at scale
Design an experiment tracking infrastructure capable of handling thousands of concurrent runs with full auditability
Build resource orchestration and scheduling systems that optimise for throughput, cost, and latency across heterogeneous hardware
Set engineering standards for infrastructure reliability, capacity planning, and operational excellence as an early technical leader
What skills you need for the journey
Proven experience designing and operating large-scale distributed infrastructure at 1,000+ nodes or equivalent complexity, in any domain
Deep expertise in distributed systems, cluster orchestration (Kubernetes, Slurm, or custom schedulers), and large-scale resource scheduling
Strong production reliability instincts: observability, incident response, capacity planning, and SLA ownership across complex systems
Experience building infrastructure that other engineers build on top of, not just operating it
Ability to operate as a technical lead: set direction, make tradeoffs under uncertainty, and raise the bar for the team around you
Startup orientation. You are energised by ambiguity, move fast, and build for scale from day one
Things that will give you an edge
Exposure to ML infrastructure concepts: GPU networking (NCCL, InfiniBand, RoCE), model serving frameworks (vLLM, SGLang, TensorRT-LLM), or hardware-aware performance tuning (CuTe, Triton, TileLang)
Experience with multi-cloud GPU procurement and capacity management across AWS, GCP, Azure, and bare metal providers
Familiarity with inference marketplace architectures, dynamic routing, or spot/preemptible workload management
Prior experience at a Series A or earlier stage company scaling from early infrastructure to production
What you'll get in return
Competitive salary and meaningful equity in an early-stage AI infrastructure company. The band above is our target; for an exceptional candidate we'll go higher. Equity is real — you're early, and the grant reflects that.
Health, dental, and vision — full coverage
401(k) — company-supported retirement savings
FSA/HSA — flexible spending accounts for healthcare costs
Paid time off — we trust you to manage your time
Top-tier tooling — access to the best AI tools available: Claude, Codex, Kimi, and whatever else helps you move faster
MacBook Pro and AirPods — the hardware you need, on us
How we work (and what that feels like day-to-day)
We build the stage, not the show. We're an infrastructure company, a developer-tools company, and a production partner for model labs — and focus is a deliberate choice we've made and hold to.
Day-to-day, that means a small team, a high bar, and real ownership. You won't wait for permission or inherit a backlog of someone else's decisions. In a founding infrastructure role, the function is what you make it.
It also means ambiguity: priorities shift, not everything is documented, and you'll often be the person who decides what "good enough for now" means. That suits some people and not others, and we'd rather you know that before you apply.
Job details
Jobr Assistant extension
Get the extension →