Model AI logo

Founding Machine Learning Infrastructure Engineer

Posted about 7 hours ago

OfficePalo Alto

Founding Machine Learning Infrastructure Engineer

Location: Onsite in Palo Alto

Compensation: Competitive Salary + Equity

About Model AI

Model AI is building the infrastructure and application stack for the next generation of agentic AI systems .

We believe token usage will grow exponentially over the coming years, but routing all inference through closed model providers will remain too expensive for many users and enterprises. Our thesis is that agentic applications require a vertically integrated stack: high-throughput, cost-efficient serving infrastructure paired with an application layer designed for long-running, agentic workloads.

Model AI is building the Agent Cloud, a serving and training infrastructure platform purpose-built for agentic workloads, long-context inference, and large-scale open-source model deployment. By combining infrastructure and application design, we aim to make open-source models significantly more performant, practical, and competitive.

About This Role

We are looking for an ML Systems Engineer to help build and optimize the core serving infrastructure behind Agent Cloud. This role focuses on high-performance inference across different accelerators.

You will work on model serving performance, accelerator utilization, long-context inference, batching, scheduling, KV cache management, runtime efficiency, and cost reduction. This is a deeply technical role at the intersection of ML systems, infrastructure, and product.

Direct TPU experience is a strong plus, but not required. We care most about strong ML systems fundamentals, performance intuition, and the ability to ship reliable systems quickly.

What You'll Do

  • Optimize large-scale LLM inference and serving systems.

  • Improve total tokens per second, decode tokens per second, latency, throughput, and cost efficiency.

  • Work on serving infrastructure for open-source models across different types of accelerators.

  • Improve batching, scheduling, KV cache management, memory usage, and accelerator utilization.

  • Support long-context inference, including workloads targeting up to 1M context.

  • Debug performance bottlenecks across model execution, runtime, networking, and infrastructure.

  • Work with frameworks such as JAX/XLA, PyTorch, vLLM, SGLang, TensorRT-LLM, or related systems.

  • Collaborate closely with the application team to ensure infrastructure is optimized for agentic workloads, not just generic chatbot inference.

  • Help turn research prototypes into reliable, high-performance production systems.

Qualifications

  • Strong experience in ML systems, distributed systems, or high-performance computing.

  • Experience optimizing inference or training workloads for large models.

  • Familiarity with TPUs, GPUs, or other accelerators.

  • Experience with one or more of CUDA, Triton, NCCL, JAX/XLA, PyTorch internals, vLLM, SGLang, TensorRT-LLM, distributed inference, or distributed training.

  • Strong systems debugging skills.

  • Comfort working across model code, runtime, infrastructure, and product requirements.

  • High ownership and the ability to operate effectively in an early-stage startup environment.

Cultural Fit

  • Hands-on technical excellence and strong engineering judgment.

  • End-to-end ownership, from design to implementation to production outcomes.

  • Bias for action: ship quickly, learn from failures, and iterate.

  • High intensity during critical milestones, with a focus on real customer impact.

  • Ability to do deep, focused work and sustain execution.

  • Clear communication with teammates, customers, and stakeholders.

  • Comfort with ambiguity, rapid change, and wearing multiple hats.

  • Low ego, high integrity, high accountability, and strong collaboration.

  • Continuous learning and a belief that judgment, intelligence, and capability compound over time.

If you are excited to build the infrastructure and agent systems behind the next generation of AI applications, push open-source models to production-grade performance, and turn ambitious research ideas into real-world impact, Model AI is the place for you.

Job details
Workplace
Office
Location
Palo Alto

Model AI builds AI systems for real-world deployment.

Industry
Professional Services
Headquarters
Toronto
Founded
2020
Company location
Toronto, CA
Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups