
Backend Engineer
Sign in
Posted about 9 hours ago
About Bespoke Labs
Bespoke Labs is an applied AI research lab pioneering data and RL environment curation for training and evaluating agents.
Recently, we curated Open Thoughts, one of the best open reasoning datasets used by multiple frontier labs, trained SOTA specialized models such as Bespoke-MiniChart-7B and Bespoke-MiniCheck, and built the environment infrastructure that frontier labs and enterprises use to make their agents reliable.
Bespoke is uniquely positioned to capture a large share of data and RL environment curation.
About the Role
We're looking for an Infrastructure Engineer to own the execution layer beneath our RL environments: the systems that let an agent operate inside a realistic, multi-tool world coherently for hours or days.
This is a hard systems problem disguised as an AI job. As the tasks agents can complete keep lengthening, the environments that train them have to stay coherent across far longer horizons than anything that exists today. That means sandboxing and isolation you can trust, execution that's fast and cheap enough to run at training scale, and the ability to snapshot, restore, inspect, and branch a running environment instead of treating every rollout as one-shot. You'll build the platform that makes all of this possible.
You'll work closely with our research and data teams, and directly with frontier labs and enterprise customers, to turn environment designs into infrastructure that runs reliably in production.
What You'll Do
Environment Execution & Sandboxing:
Design and own the sandboxing and execution layer that environments run inside. Build systems to snapshot and restore environment state (disk, process, and where relevant memory and accelerator state) so runs can be paused, resumed, inspected, and branched rather than executed once.
Develop the machinery to detect failure modes early in a rollout (reward hacks, infra faults, fairness issues) and to revert to a known-good state, patch, and continue.
Extend execution to long-horizon and multi-node environments, where an agent operates across many tools and services over hours or days.
Performance & Scale
Own the performance characteristics of the platform: throughput, latency, and cost-per-rollout at scale.
Drive utilization and scheduling so we can run far more environment rollouts per dollar without sacrificing reliability.
Profile and remove bottlenecks across the stack, from container startup to environment teardown.
Build the observability that lets us understand what's happening inside thousands of concurrent, long-running rollouts.
Environment Platform
Build and maintain the framework for specifying, packaging, and deploying RL environments which is used by both humans and agents authoring environments internally.
Create the tooling that lets researchers and environment authors debug a specific failure across hundreds of long agent traces.
Collaboration & Production Excellence
Scale prototypes into production systems with reproducible workflows and high engineering standards.
Write the documentation and tools that let internal teams and external users build on the platform.
What We're Looking For
Systems & Infrastructure
Strong track record building production systems or research infrastructure at scale: distributed systems, execution engines, container/sandboxing infrastructure, or similar.
Deep comfort with the systems layer: containers and isolation (e.g. namespaces, cgroups, VMs, gVisor/Firecracker-style sandboxing), filesystems, process and state management.
Experience making systems fast and cheap — profiling, scheduling, resource utilization, and cost optimization at scale.
Proficiency with cloud platforms (GCP, AWS) and distributed computing.
Strong engineering fundamentals and a systematic approach to testing, validation, and reliability.
Execution & Ownership
Comfort operating in ambiguity.
Strong Python skills; comfort in a systems language (Rust, Go, or C++) is a plus.
Ability to use modern tools such as Claude Code effectively.
Collaboration & Communication
Excellent communication skills for working with research teams and enterprise customers.
Ability to translate between research needs and infrastructure requirements.
Comfortable presenting technical work to diverse audiences.
Nice to Have
Experience with RL training or evaluation infrastructure, or the execution layer for agent rollouts.
Experience with checkpoint/snapshot-restore systems, CRIU, or distributed state management.
Background in high-throughput, low-latency execution systems.
Contributions to widely-used infrastructure, datasets, benchmarks, or open-source systems.
Job details
Jobr Assistant extension
Get the extension →