Job Description
This isn’t a traditional senior engineering role. You won’t spend most of your time implementing product features directly.
Your time will roughly split:
- 50% building and evolving agent harnesses: orchestration, toolchains, approvals, secure execution, managed agents
- 50% reviewing and improving outputs: tracing failures, improving prompts/steering, tightening eval harnesses, reducing loop count
Concretely, you’ll:
- Design and implement agentic workflows that take a requirement from spec → code → review → deploy
- Build agentic loops that turn mistakes into system-level improvements (not one-off fixes)
- Develop evaluation harnesses (offline + CI) to detect regressions in behavior, not just tests in code
- Define and maintain review gates (human-in-the-loop + automated reviewers) for risky changes
- Improve tool reliability: schemas, typed tool interfaces, retries, timeouts, safety checks
- Build platform capabilities for managed agents: long-running sessions, checkpoints, state/memory boundaries, and recovery
- Evolve the platform architecture (TypeScript, serverless architecture, shared codebase) with an eye for simplicity and maintainability
- Partner with Product to reduce ambiguity and translate intent into testable, evaluable spec
Qualifications
This role requires strength in two areas, equally:
- Systems thinking for agent harnesses and loops. You can design the execution harness around agents: feedback loops, evaluation strategy, safety constraints, and the “glue code” that makes autonomy safe in production.
- Engineering taste. You can look at agent-generated code and immediately judge: conventions, simplicity, correctness, maintainability, security. Not just “does it work,” but “would I approve this PR in a regulated product?”
What we need from you
- Strong TypeScript and React experience in production environments
- You’ve shipped real software to real users (not just prototypes)
- You can read a codebase and quickly identify its patterns, conventions, and architecture
- You are comfortable working in ambiguity and turning fuzzy intent into clear acceptance criteria + evals
- Familiarity with agent tooling concepts: tool calling, MCP/tool integration, guardrails, evals, tracing/observability, and permissioning
- Nice to have: AWS serverless experience (CDK, Lambda, DynamoDB). Our backend is a mix of modern serverless microservices and a legacy Express/PostgreSQL monolith.
Who this role is not for
Be honest with yourself:
- If you want to spend most of your time building features directly, this role will frustrate you
- If you’re excited about AI but haven’t shipped production software, you won’t have the taste to judge agent output
- If you prefer stable scope, established best practices, and minimal ambiguity, this environment won’t be a match
The team and company
You’ll join a small team (3–4 engineers) reporting to a hands-on CTO. The company is going all-in on this model, not just engineering — sales, marketing, and support are all building agentic workflows for their functions.
This isn’t a side experiment; it’s our operating model.
Additional Information
We’re guided by trust, respect, and ownership. Our values, Embrace Change, Carte Blanche, Find Wisdom in Data, and We All “Own It”, shape how we work.
- Fully remote (work from anywhere in Australia)
- 5 weeks annual leave and flexible working
- Monthly Wellness Budget (mental & physical health)
- Employee share options (ESOP) for all team members
How to apply?
Submit your CV via the application form. Note that background checks are required as part of our offer process.
We welcome applications from all backgrounds, abilities, and identities. We value diversity and believe that it enhances our creativity, innovation, and overall success. Join us in creating a workplace where everyone can thrive.
Other open roles at Budgetly(2)
Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.