FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Backend Engineer
Bespoke LabsInfrastructure Engineer at Bespoke Labs designing and building the execution layer for RL environments. Collaborating with research and data teams to enhance the reliability and performance of their AI agents.
Tech Stack
Tools & technologiesAWSCloudDistributed SystemsGoGoogle Cloud PlatformNode.jsPythonRust
About the role
Key responsibilities & impact- Design and own the sandboxing and execution layer that environments run inside.
- Build systems to snapshot and restore environment state (disk, process, and where relevant memory and accelerator state) so runs can be paused, resumed, inspected, and branched rather than executed once.
- Develop the machinery to detect failure modes early in a rollout (reward hacks, infra faults, fairness issues) and to revert to a known-good state, patch, and continue.
- Extend execution to long-horizon and multi-node environments, where an agent operates across many tools and services over hours or days.
- Own the performance characteristics of the platform: throughput, latency, and cost-per-rollout at scale.
- Drive utilization and scheduling so we can run far more environment rollouts per dollar without sacrificing reliability.
- Profile and remove bottlenecks across the stack, from container startup to environment teardown.
- Build the observability that lets us understand what's happening inside thousands of concurrent, long-running rollouts.
- Build and maintain the framework for specifying, packaging, and deploying RL environments which is used by both humans and agents authoring environments internally.
- Create the tooling that lets researchers and environment authors debug a specific failure across hundreds of long agent traces.
- Scale prototypes into production systems with reproducible workflows and high engineering standards.
- Write the documentation and tools that let internal teams and external users build on the platform.
Requirements
What you’ll need- Strong track record building production systems or research infrastructure at scale: distributed systems, execution engines, container/sandboxing infrastructure, or similar.
- Deep comfort with the systems layer: containers and isolation (e.g. namespaces, cgroups, VMs, gVisor/Firecracker-style sandboxing), filesystems, process and state management.
- Experience making systems fast and cheap — profiling, scheduling, resource utilization, and cost optimization at scale.
- Proficiency with cloud platforms (GCP, AWS) and distributed computing.
- Strong engineering fundamentals and a systematic approach to testing, validation, and reliability.
- Comfort operating in ambiguity.
- Strong Python skills; comfort in a systems language (Rust, Go, or C++) is a plus.
- Ability to use modern tools such as Claude Code effectively.
- Excellent communication skills for working with research teams and enterprise customers.
- Ability to translate between research needs and infrastructure requirements.
- Comfortable presenting technical work to diverse audiences.
- Experience with RL training or evaluation infrastructure, or the execution layer for agent rollouts.
- Experience with checkpoint/snapshot-restore systems, CRIU, or distributed state management.
- Background in high-throughput, low-latency execution systems.
- Contributions to widely-used infrastructure, datasets, benchmarks, or open-source systems.
- Previous experience in a research engineering or infrastructure role at an AI or systems-heavy company.
Benefits
Comp & perks- Health coverage
- Opportunity to work directly with the world's leading AI research labs
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonRustGoC++distributed systemsexecution enginescontainer infrastructureprofilingschedulingresource utilization
Soft Skills
communicationsystematic approachcomfort in ambiguityability to translate between research needs and infrastructure requirementspresentation skills