About HelmGuard

We're building agent-native risk infrastructure: risk management and trust building, delivered by AI agents, for a world increasingly run by them. As more decisions and transactions run through agents, the volume of risk to manage and trust to establish is growing very fast. Today both functions are fragmented, split across internal teams, point products, and outside consultants, and rebuilt from scratch whenever someone needs an answer. HelmGuard brings them onto one platform and runs them continuously: our agents sit on top of proprietary data and reassess as conditions change, rather than at fixed checkpoints. So our customers spend their time deciding and acting on what matters, not assembling the evidence to get there.

Hundreds of billions of dollars are spent across risk management and trust building annually. These funds are going to be reallocated to agent-native solutions in the next five years, and we will capture that spend.

We've grown to seven-figure revenue within months of product launch, on the back of multi-year contracts with leading enterprises in financial services, regulated technology, and healthcare. Our founders come from Palantir and academic institutions: Oxford, Stanford, and ETH. We're backed by leading UK and US institutional investors and exceptional angels from Meta, Isomorphic Labs, Palantir, SpaceXAI, and more.

We're hiring across founding-team roles for people who want outsize impact, the influence over direction and culture that comes only from joining this early, and pre-Series A equity upside.

Your Impact

We already have the best agent scaffolding and orchestration in the trust and risk space. You'll make it the best of anyone shipping enterprise agents, in any vertical. You'll embody the AI-native services thesis our customers are betting on: agents that don't assist with workflows but become the system of action for them.

The Role

You own the agent platform: the orchestration, evals, and reliability work that turns model calls into product features customers trust. The bar is not that the demo works but rather that a domain expert reading our agent's output considers it at the level of a peer. You own the technical delivery to make that possible.

This isn't a research role at its core: we consume frontier APIs and make them production-grade. We push them hard, though, hard enough that we recently found and reported a bug in the Anthropic API that took their engineers weeks to reproduce. At that level, the line between using these models and studying them gets thin, so if research-flavoured work pulls at you, there's room to follow it.

What You Will Do

Agent scaffolding: tool use, context management, sandboxing, prompt-injection defence
Evals for fuzzy, high-stakes outputs: assessments, policy interpretation, control mapping
Reliability infrastructure: retries, fallbacks, circuit breakers, prompt versioning
The internal standard for what "good enough to ship" means for AI features here

What you bring

Experience with backend engineering in TypeScript or comparable, with 1–2+ years shipping production LLM features
Experience with agent frameworks, tool calling, and multi-step orchestration
Production evals chops: dataset curation, LLM-as-judge failure modes, regression testing under model swaps
Strong systems thinking: async, queues, idempotency
Comfort being the named owner of AI quality, including saying no when needed

Nice to have

Anthropic, OpenAI, or open-weight APIs in production at scale
Prompt-injection or agent-security work
Background in compliance, audit, or any domain where correctness is fuzzy and stakes are high

Culture and Values

We value a diversity of perspectives and experiences. We also hold a small set of core beliefs that reflect how we operate, and share them transparently with candidates so the fit is clear from the outset.

Put Customers First. Our customers buy outcomes from us, not features. We judge every decision by whether it delivers on that promise.

Take Ownership. Founding-stage means problems don't come pre-scoped. You see something that needs doing, scope it, ship it, own the outcome. We expect this from everyone, and provide you the backing to execute on it.

Work Hard, With Gratitude. This is the most consequential window for building enterprise software in a generation. We work hard because the opportunity is rare, and we do it with gratitude for the moment, for the people we get to build it with, and for the customers willing to bet on us this early.

Say the Silly Thing. The best ideas usually start out sounding half-baked, so we'd rather you say the silly thing than sit on it. We want you opinionated and willing to argue, and just as willing to change your mind when someone makes a better case. Disagreement here is a contribution, not a risk.

Working at HelmGuard

Location. King's Cross, London (Gridiron building). We're built around in-person collaboration and expect most days to be in-office, with flexibility for the days that need it.

Compensation. Top decile for the London market, with meaningful EMI-eligible options.

Perks. Daily team lunch and specialty coffee, a roof terrace overlooking King's Cross, on-site showers for those who enjoy active commuting, and serious per-engineer AI tooling and API budgets.

Interview process. Three stages: behavioural phone screen, technical phone screen, and a paid on-site work trial. Target turnaround is under two weeks from first conversation.

Tech Stack. TypeScript, Node.js, React, Tailwind, OpenAPI, Express, Azure (Container Apps, Service Bus, Front Door, Entra ID), Postgres, Terraform, GitHub Actions, Docker. Anthropic-first AI with in-house evals and scaffolding. Claude Code throughout.

Founding Engineer, Agent Systems