Harper logo

Staff Engineer, Engineering Productivity & AI Quality

Harper

Posted about 1 hour ago

The Problem

36 million businesses in America need insurance - it's not optional. 77% are underinsured. 40% have no coverage at all. The distribution system failed them: too slow, too opaque, too confusing.

Over 90% of commercial insurance is still human-led. We're building the inverse: 90%+ AI-led, pushing toward the higher 90s. Not by patching legacy workflows - by building AI that makes humans more effective, improves the customer experience, and eliminates friction at every step.

We're adding ~1,000 customers per month. We've grown 100x since last year. We're scaling toward Series B. AI-generated code volume has pulled forward the scaling problem - even with a 20-person engineering team, our coding agents create surface area, review burden, and architectural drift that look like a 100-person org.

Build the rails before AI code volume turns every service into a rework trap. If we don't build the rails, the CTO becomes the rail. That doesn't scale.

The Thesis

Every great AI company ends up building the same invisible machine: the harnesses, tests, instructions, and review loops that let a small team ship with impossible leverage. At Harper, that machine is existential. Our agents write code, serve customers, assemble submissions, and make decisions that move revenue. If the rails are strong, twenty engineers can operate like one hundred. If the rails are weak, velocity turns into drag.

This is the founding seat for that machine. You'll turn the CTO's taste into systems: PR preflight, integration tests, architecture rules, agent instructions, eval gates, and feedback loops every engineer feels every day. The mission is simple: make the right way the easy way, and make Harper's engineering org compound with every ship.

The Role

Harper operates like a factory with a series of modules spanning the full lifecycle from intake through renewals. Across them we run a stack of internal AI systems covering operator guidance, the operational backbone that matches risks to underwriters, autonomous communications, and voice AI for customer interactions.

You own the rails underneath the factory - the CI gates, integration test harnesses, agent instructions, PR preflight, architecture linting, dev environment reliability, and dead-code cleanup that the entire engineering team builds against. Three sub-disciplines live under this function:

  1. Harness Engineering - the meta-harness on top of our frontier coding agents, OpenClaw, Hermes, and our internal agents

  2. Developer Experience - CI/CD gates, build caching, merge queues, dev/staging/CI parity, internal developer platform, eval framework infrastructure

  3. AI Quality - eval suite design, golden datasets, LLM-as-judge graders, production trajectory monitoring, drift detection, anti-slop guardrails

What You'll Own

  • CI/CD quality gates across Harper's most critical services - Define the minimum bar before code can merge

  • Integration test harnesses anchored to real failure modes - Every repeated operational failure becomes a regression test, a validation, or an architecture rule

  • The agent harness substrate - Sandbox lifecycle, tool routing, prompt/context layer, model-provider abstraction, multi-agent coordination

  • Repo-level agent instructions and context hygiene - AGENTS.md per repo, canonical data model docs, banned patterns. The information environment our coding agents read.

  • Automated PR preflight - Service impact summary, tests run, missing tests, model/migration changes, critical-path warnings. The robot that reviews every PR before a human does.

  • Architecture-rule enforcement - Custom lints and structural tests that encode the CTO's taste mechanically. Once a rule is written down, it never has to be argued in PR comments again.

  • Eval framework infrastructure - Pre-merge eval gating, experiment runs against curated datasets, production trajectory monitoring. All three wired together.

  • Engineering metrics that matter - Rework rate, escaped defects, flaky test count, deploy rollbacks, time-to-confident-ship, AI-generated PR quality. Anti-vanity. Anti-LOC.

You Might Be a Fit If…

  • You've built or scaled developer productivity, platform, build/test, CI/CD, or internal tooling systems at a high-growth startup or AI-infrastructure company

  • You can write and review production code at a Staff level - this is not a process or PM role

  • You have strong opinions about maintainability, architecture, testability, and developer experience - and you back them up with mechanical enforcement, not lectures

  • You're excited by AI coding agents but skeptical enough to build the guardrails they need

  • You can describe a specific lint rule, integration test, or eval-harness pattern you built that prevented a class of bugs from reaching production again

  • You write code with AI daily and routinely manage 3+ parallel coding sessions

  • You like creating leverage for other engineers more than owning a single product surface

  • You're 8–12 years into your career, with 3+ years at the Senior+ level

If "Engineering Productivity" sounds like dashboards and roadmaps to you, this isn't it. We measure ourselves on rework prevented and confident-ship time, not artifacts produced.

Requirements

  • 8+ years software engineering experience, including senior+ scope at a high-growth company

  • Track record of building developer productivity, platform, CI/CD, build systems, test infrastructure, or internal tooling that other engineers actually adopted

  • Production AI/ML systems experience - agent harness, eval frameworks, LLM-as-judge graders, prompt/context engineering - even if not your primary stack

  • Strong written communication - RFCs, architecture-rule docs, lint-rule rationale, internal playbooks

  • Based in San Francisco or willing to relocate

Nice to Have

  • Built or contributed to eval-framework infrastructure (open-source or internal)

  • Built developer platforms at an AI-native or high-growth company

  • Custom lint-rule / structural-test authoring at scale

  • Built or operated agent harnesses (sandboxing, isolation, agent execution environments)

  • Worked alongside a CTO whose architectural taste needed to be encoded into mechanical rules

Compensation

  • OTE: $253,000–$308,000 cash compensation (base salary + target performance bonus)

  • Equity: competitive equity, so you share in the company you are helping build

  • Location: San Francisco, in-office

Benefits

  • Health, dental, and vision insurance

  • Commuter benefits

  • Team meals and snacks

The Process

  1. Founder call (15 min) - Mission, pace, scope you'd own

  2. CTO deep-dive (60 min) - Architecture-rule taste, eval-harness depth, real-world examples

  3. Super Day on-site - full-day simulation of working at Harper: code review, eval-harness design, dev-environment debug, cross-functional sessions, and founder/CTO time

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Office

Location

San Francisco

Experience

SE

Salary

253k - 308k USD

per year

Similar

Jobr Assistant extension

Get the extension →