Anyscale logo

Forward Deployed Engineer - AI/ML Platforms

Posted about 24 hours ago

OfficeSan Francisco267k - 287k USD

At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

 

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

 

Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

 

About the role:

As a Forward Deployed Engineer - AI/ML Platforms at Anyscale, you’ll partner with some of the world’s most sophisticated AI organizations to design, deploy, and operate the infrastructure powering their production AI workloads.

In this role you will work directly with customer platform, infrastructure, and ML engineering teams to solve complex technical challenges. You will help customers build scalable AI platforms, modernize ML infrastructure, and operationalize distributed AI applications on Ray and the Anyscale platform.

You will combine deep cloud infrastructure expertise with strong customer engagement skills, serving as both a trusted technical advisor and a hands-on engineer. You will work closely with customer teams throughout implementation, from architecture and deployment through production operations. Your work will provide feedback that directly influences the evolution of the Anyscale platform.

In this role, you will:

  • Design and implement production-grade AI platform architectures on Kubernetes and public cloud infrastructure (AWS, Azure, and GCP).

  • Partner directly with customer platform, infrastructure, and ML engineering teams to deploy, operate, and optimize distributed AI workloads.

  • Lead implementation engagements that include platform installation, networking, security, observability, scaling, upgrades, and operational readiness.

  • Troubleshoot complex distributed systems issues spanning infrastructure, Kubernetes, networking, storage, and AI applications.

  • Develop automation, tooling, reference implementations, and infrastructure-as-code that accelerate customer success and improve repeatability.

  • Build trusted relationships with technical leaders, platform teams, and executive stakeholders, translating business objectives into robust technical solutions.

  • Collaborate closely with Product and Engineering to communicate customer requirements, identify product improvements, and shape future platform capabilities.

  • Share best practices through technical documentation, architecture guidance, workshops, and enablement.

We'd love to hear from you if you have:

  • 5+ years of experience in cloud infrastructure, platform engineering, DevOps, Site Reliability Engineering, or software engineering.

  • Experience building, deploying, or operating ML/AI platforms that support model training, inference, or large-scale data processing workloads.

  • Strong expertise with Kubernetes and containerized production environments.

  • Experience operating cloud infrastructure on AWS, Azure, or GCP, including networking, security, IAM, storage, and infrastructure automation.

  • Experience with Infrastructure as Code and modern DevOps tooling such as Terraform, Helm, GitOps, CI/CD pipelines, or similar technologies.

  • Strong software engineering skills in Python, Go, Java, or a comparable language, with experience building automation or production services.

  • Experience working directly with enterprise customers in consulting, professional services, field engineering, solutions architecture, or another customer-facing engineering role.

  • Excellent communication skills and the ability to work effectively with both executive and deeply technical stakeholders.

  • Familiarity with distributed computing frameworks such as Ray, Spark, Dask, or Kubernetes-native distributed systems is a strong plus.

  • A passion for solving difficult customer problems and building reusable technical solutions.

  • Willingness to travel as needed to work alongside strategic customers.

Job details
Workplace
Office
Location
San Francisco
Salary
267k - 287k USD
per year

Powered by Ray, Anyscale helps AI builders run data-intensive workloads to build and deploy Foundation Models and AI at scale on any cloud.

Key team members

Patrick Lonergan

Patrick Lonergan

Lou Serlenga

Lou Serlenga

Angelina Le Grix

Angelina Le Grix

Arun Singhal

Arun Singhal

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups