
Staff Engineer, Engineering Productivity & AI Quality
Harper
Posted about 1 hour ago
The Problem
36 million businesses in America need insurance - it's not optional. 77% are underinsured. 40% have no coverage at all. The distribution system failed them: too slow, too opaque, too confusing.
Over 90% of commercial insurance is still human-led. We're building the inverse: 90%+ AI-led, pushing toward the higher 90s. Not by patching legacy workflows - by building AI that makes humans more effective, improves the customer experience, and eliminates friction at every step.
We're adding ~1,000 customers per month. We've grown 100x since last year. We're scaling toward Series B. AI-generated code volume has pulled forward the scaling problem - even with a 20-person engineering team, our coding agents create surface area, review burden, and architectural drift that look like a 100-person org.
Build the rails before AI code volume turns every service into a rework trap. If we don't build the rails, the CTO becomes the rail. That doesn't scale.
The Thesis
Every great AI company ends up building the same invisible machine: the harnesses, tests, instructions, and review loops that let a small team ship with impossible leverage. At Harper, that machine is existential. Our agents write code, serve customers, assemble submissions, and make decisions that move revenue. If the rails are strong, twenty engineers can operate like one hundred. If the rails are weak, velocity turns into drag.
This is the founding seat for that machine. You'll turn the CTO's taste into systems: PR preflight, integration tests, architecture rules, agent instructions, eval gates, and feedback loops every engineer feels every day. The mission is simple: make the right way the easy way, and make Harper's engineering org compound with every ship.
The Role
Harper operates like a factory with a series of modules spanning the full lifecycle from intake through renewals. Across them we run a stack of internal AI systems covering operator guidance, the operational backbone that matches risks to underwriters, autonomous communications, and voice AI for customer interactions.
You own the rails underneath the factory - the CI gates, integration test harnesses, agent instructions, PR preflight, architecture linting, dev environment reliability, and dead-code cleanup that the entire engineering team builds against. Three sub-disciplines live under this function:
Harness Engineering - the meta-harness on top of our frontier coding agents, OpenClaw, Hermes, and our internal agents
Developer Experience - CI/CD gates, build caching, merge queues, dev/staging/CI parity, internal developer platform, eval framework infrastructure
AI Quality - eval suite design, golden datasets, LLM-as-judge graders, production trajectory monitoring, drift detection, anti-slop guardrails
What You'll Own
CI/CD quality gates across Harper's most critical services - Define the minimum bar before code can merge
Integration test harnesses anchored to real failure modes - Every repeated operational failure becomes a regression test, a validation, or an architecture rule
The agent harness substrate - Sandbox lifecycle, tool routing, prompt/context layer, model-provider abstraction, multi-agent coordination
Repo-level agent instructions and context hygiene - AGENTS.md per repo, canonical data model docs, banned patterns. The information environment our coding agents read.
Automated PR preflight - Service impact summary, tests run, missing tests, model/migration changes, critical-path warnings. The robot that reviews every PR before a human does.
Architecture-rule enforcement - Custom lints and structural tests that encode the CTO's taste mechanically. Once a rule is written down, it never has to be argued in PR comments again.
Eval framework infrastructure - Pre-merge eval gating, experiment runs against curated datasets, production trajectory monitoring. All three wired together.
Engineering metrics that matter - Rework rate, escaped defects, flaky test count, deploy rollbacks, time-to-confident-ship, AI-generated PR quality. Anti-vanity. Anti-LOC.
You Might Be a Fit If…
You've built or scaled developer productivity, platform, build/test, CI/CD, or internal tooling systems at a high-growth startup or AI-infrastructure company
You can write and review production code at a Staff level - this is not a process or PM role
You have strong opinions about maintainability, architecture, testability, and developer experience - and you back them up with mechanical enforcement, not lectures
You're excited by AI coding agents but skeptical enough to build the guardrails they need
You can describe a specific lint rule, integration test, or eval-harness pattern you built that prevented a class of bugs from reaching production again
You write code with AI daily and routinely manage 3+ parallel coding sessions
You like creating leverage for other engineers more than owning a single product surface
You're 8–12 years into your career, with 3+ years at the Senior+ level
If "Engineering Productivity" sounds like dashboards and roadmaps to you, this isn't it. We measure ourselves on rework prevented and confident-ship time, not artifacts produced.
Requirements
8+ years software engineering experience, including senior+ scope at a high-growth company
Track record of building developer productivity, platform, CI/CD, build systems, test infrastructure, or internal tooling that other engineers actually adopted
Production AI/ML systems experience - agent harness, eval frameworks, LLM-as-judge graders, prompt/context engineering - even if not your primary stack
Strong written communication - RFCs, architecture-rule docs, lint-rule rationale, internal playbooks
Based in San Francisco or willing to relocate
Nice to Have
Built or contributed to eval-framework infrastructure (open-source or internal)
Built developer platforms at an AI-native or high-growth company
Custom lint-rule / structural-test authoring at scale
Built or operated agent harnesses (sandboxing, isolation, agent execution environments)
Worked alongside a CTO whose architectural taste needed to be encoded into mechanical rules
Compensation
OTE: $253,000–$308,000 cash compensation (base salary + target performance bonus)
Equity: competitive equity, so you share in the company you are helping build
Location: San Francisco, in-office
Benefits
Health, dental, and vision insurance
Commuter benefits
Team meals and snacks
The Process
Founder call (15 min) - Mission, pace, scope you'd own
CTO deep-dive (60 min) - Architecture-rule taste, eval-harness depth, real-world examples
Super Day on-site - full-day simulation of working at Harper: code review, eval-harness design, dev-environment debug, cross-functional sessions, and founder/CTO time
Job details
Jobr Assistant extension
Get the extension →