Forward Deployed Engineer, RL Environments

LabelBox • Full-time • San Francisco Bay Area • 3w ago

The Role

We’re hiring a Forward Deployed Engineer to own the design, development, and operationalization of reinforcement learning environments. You’ll build the sandboxed, reproducible execution environments that AI agents interact with during training and evaluation—things like terminal-based task benchmarks, browser and computer-use environments, and tool-augmented agentic workspaces.

This is a hands-on engineering role. You’ll write production-quality infrastructure code, integrate with open-source RL tooling, and work closely with our data operations team to ensure environments are robust, observable, and ready for human annotators and model agents alike. You won’t be doing ML research, but you’ll need to deeply understand how RL training loops consume environments and where the bottlenecks live.

What You’ll Do

Design, build, and maintain sandboxed RL environments for agentic AI training—including terminal emulators, browser automation harnesses, computer-use simulators, and tool-augmented workspaces (e.g., environments built on frameworks like TerminalBench, OSWorld, and Tau-bench)
Develop reproducible, containerized execution environments (Docker, VMs, lightweight sandboxes) that support deterministic task rollouts and reward signal collection
Integrate with and extend open-source agentic tooling and custom CLI/API harnesses to enable multi-step agent interaction
Build instrumentation and observability layers—structured logging, trajectory capture, state snapshotting—so training runs and human annotation sessions produce clean, auditable data
Collaborate with data operations to design task curricula and evaluation protocols that stress-test model capabilities across environment types
Own environment deployment and reliability: CI/CD pipelines, automated testing of environment configurations, and monitoring for drift or breakage across versions
Rapidly prototype new environment types as client and internal requirements evolve, moving from spec to working system in days, not weeks

What We’re Looking For

Required

2+ years of professional software engineering experience, with strong fundamentals in Python and at least one systems-level language (Go, Rust, C++)
Demonstrated experience with containerization and sandboxing (Docker, Podman, Firecracker, or similar) in production or near-production contexts
Familiarity with RL concepts: MDPs, reward shaping, episode structure, observation/action spaces. You don’t need to have trained models, but you need to understand what an environment must provide to an RL training loop
Experience building or maintaining developer tooling, CLI tools, or infrastructure automation
Comfort working with browser automation frameworks or terminal interaction tooling
Strong debugging instincts—you can trace failures across process boundaries, container layers, and network calls
Ability to read and implement from academic papers and open-source benchmark repositories without extensive hand-holding

Preferred

Direct experience building or contributing to RL environments (Gymnasium/Gym, PettingZoo, or custom environment implementations)
Experience with agentic AI evaluation frameworks (SWE-bench, WebArena, OSWorld, TerminalBench, or similar)
Familiarity with GCP or AWS infrastructure (Compute Engine, ECS/EKS, Cloud Build)
Prior work at an AI data company, ML platform company, or AI research lab
Contributions to open-source projects in the RL, agents, or dev-tools space

Candidate Archetype

The ideal candidate is a strong software engineer first, with genuine curiosity and working knowledge of reinforcement learning. You’ve probably built infrastructure or developer tooling at a startup or mid-stage company, and you’ve been pulled toward the ML/AI space—maybe through side projects, open-source contributions, or a prior role adjacent to an ML team. You’re the kind of engineer who reads an RL benchmark paper and immediately thinks about how to make the environment more robust, not how to improve the policy gradient.

You thrive in ambiguity. You can take a loosely defined project requirement—“build an environment that tests an agent’s ability to navigate a file system and execute multi-step bash workflows”—and deliver a working, tested, documented system without needing a detailed spec. You move fast, but you care about reliability because you know environments that break silently poison training data.

Why This Role Matters

RL environment quality is one of the biggest bottlenecks in agentic AI training today. Environments that are brittle, non-deterministic, or poorly instrumented produce noisy reward signals that directly degrade model performance. You’ll be solving one of the highest-leverage infrastructure problems in AI.
You’ll work across a portfolio of projects spanning different AI labs and model capabilities—no single-product monotony. The environment types you build will evolve as the frontier of agent capabilities moves.
Alignerr is a small, high-impact team inside a well-funded company (Labelbox). You’ll have startup-level ownership with growth-stage resources.

_{Alignerr Services at Labelbox}

Alignerr is Labelbox’s human data organization, purpose-built to generate the high-quality training data that powers the next generation of AI models. We partner directly with leading AI labs to produce reinforcement learning environments, evaluation benchmarks, and expert-annotated datasets that push model capabilities forward. Our team sits at the intersection of software engineering, ML infrastructure, and human-in-the-loop data production.