We're hiring a full-time AI Engineer to own the prompts, agents, evals, and pipelines behind user-facing features that ship to users.

You'll take product requirements and turn them into working prompts, agents, and pipelines. You'll evaluate them rigorously, iterate until they're production-ready, and keep improving them once they ship. This role sits at the intersection of product and platform: you decide what the AI should do, prove it works, and get it in front of users.

Because we're an early-stage company moving fast, we're looking for someone who can work quickly through ambiguous AI problems, measure output quality, and ship only when the system is reliable enough for production. This is an in-person role, 5 days a week in our office. The ability to tell the difference between "looks good in the demo" and "works in production" is essential.

Key Responsibilities

Build new AI features end to end, from prototype to production.
Improve AI output quality through prompt engineering, model selection, retrieval, and evaluation.
Design and run evals that measure real output quality, not just first impressions.
Iterate fast on prompts, agent designs, and orchestration patterns.
Partner with the Product Engineer to translate requirements into AI features that actually work.
Partner with the AI Platform team to land features on solid infrastructure.
Evaluate new models, tools, and techniques when they improve quality, latency, cost, or reliability.

What We Are Looking For

Hands-on experience building LLM-powered features that shipped to real users
Production engineering chops in TypeScript/Node (primary, especially in AWS Lambda) and/or Python
Experience with multiple LLM providers such as Anthropic, OpenAI, Google Vertex, AWS Bedrock, or similar
Practical judgment in prompt engineering, retrieval, and agent design, backed by evaluation results
Track record of building evaluation systems that actually catch regressions
Solid software engineering fundamentals: you can write production code, not just notebooks

Nice to Have

Experience with provider-abstraction libraries for multi-LLM workflows
Familiarity with pgvector or other vector retrieval systems
Experience with post-training or fine-tuning
Experience deploying AI features on AWS Lambda, ECS Fargate, or similar
Background in ML, NLP, or applied research
Experience with structured output, function calling, and tool use at scale
Experience with Anyscale Ray or similar distributed compute frameworks for batch inference, eval pipelines, or scaling agent workloads
Open source contributions in the LLM or agent tooling space

About Fluency

Fluency builds a platform that captures how work actually happens inside large organizations, measures productivity and process conformance, and analyzes where AI can do the work.

We capture observable work data across tools and systems, structure it into a model of how work runs, and use it to measure productivity, check process conformance, and analyze where AI changes the work.

Fluency is looking for an AI Engineer to own the AI quality and the prompts, agents, evals, and pipelines behind user-facing features that ship to Fortune 500 users.

Our Customers

Customers include CVS Health, Aon, and PVH.

Location

Full-time, in-person role based in San Francisco, CA.
We offer E-3 sponsorship for Australians to relocate with stipend.

This role is not a fit if

You want hybrid or remote
You're not comfortable with rapid iteration
You've never operated production pipelines
You dislike constraints (we have them: cost, latency, reliability tradeoffs are real)
You don't have a good reason for wanting to work at an early-stage company