Founding Machine Learning Infrastructure Engineer
Posted about 7 hours ago
Founding Machine Learning Infrastructure Engineer
Location: Onsite in Palo Alto
Compensation: Competitive Salary + Equity
About Model AI
Model AI is building the infrastructure and application stack for the next generation of agentic AI systems .
We believe token usage will grow exponentially over the coming years, but routing all inference through closed model providers will remain too expensive for many users and enterprises. Our thesis is that agentic applications require a vertically integrated stack: high-throughput, cost-efficient serving infrastructure paired with an application layer designed for long-running, agentic workloads.
Model AI is building the Agent Cloud, a serving and training infrastructure platform purpose-built for agentic workloads, long-context inference, and large-scale open-source model deployment. By combining infrastructure and application design, we aim to make open-source models significantly more performant, practical, and competitive.
About This Role
We are looking for an ML Systems Engineer to help build and optimize the core serving infrastructure behind Agent Cloud. This role focuses on high-performance inference across different accelerators.
You will work on model serving performance, accelerator utilization, long-context inference, batching, scheduling, KV cache management, runtime efficiency, and cost reduction. This is a deeply technical role at the intersection of ML systems, infrastructure, and product.
Direct TPU experience is a strong plus, but not required. We care most about strong ML systems fundamentals, performance intuition, and the ability to ship reliable systems quickly.
What You'll Do
Optimize large-scale LLM inference and serving systems.
Improve total tokens per second, decode tokens per second, latency, throughput, and cost efficiency.
Work on serving infrastructure for open-source models across different types of accelerators.
Improve batching, scheduling, KV cache management, memory usage, and accelerator utilization.
Support long-context inference, including workloads targeting up to 1M context.
Debug performance bottlenecks across model execution, runtime, networking, and infrastructure.
Work with frameworks such as JAX/XLA, PyTorch, vLLM, SGLang, TensorRT-LLM, or related systems.
Collaborate closely with the application team to ensure infrastructure is optimized for agentic workloads, not just generic chatbot inference.
Help turn research prototypes into reliable, high-performance production systems.
Qualifications
Strong experience in ML systems, distributed systems, or high-performance computing.
Experience optimizing inference or training workloads for large models.
Familiarity with TPUs, GPUs, or other accelerators.
Experience with one or more of CUDA, Triton, NCCL, JAX/XLA, PyTorch internals, vLLM, SGLang, TensorRT-LLM, distributed inference, or distributed training.
Strong systems debugging skills.
Comfort working across model code, runtime, infrastructure, and product requirements.
High ownership and the ability to operate effectively in an early-stage startup environment.
Cultural Fit
Hands-on technical excellence and strong engineering judgment.
End-to-end ownership, from design to implementation to production outcomes.
Bias for action: ship quickly, learn from failures, and iterate.
High intensity during critical milestones, with a focus on real customer impact.
Ability to do deep, focused work and sustain execution.
Clear communication with teammates, customers, and stakeholders.
Comfort with ambiguity, rapid change, and wearing multiple hats.
Low ego, high integrity, high accountability, and strong collaboration.
Continuous learning and a belief that judgment, intelligence, and capability compound over time.
If you are excited to build the infrastructure and agent systems behind the next generation of AI applications, push open-source models to production-grade performance, and turn ambitious research ideas into real-world impact, Model AI is the place for you.
Other open roles at Model AI(2)
Model AI builds AI systems for real-world deployment.
Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.