Applied Research Engineer, Agents

LabelBox • Full-time • San Francisco Bay Area • 3w ago

Role Overview

As an Applied Research Engineer at Labelbox, you’ll sit at the junction of advanced AI research and real product impact, with a focus on the data that makes modern agents work—browser interactions, SWE/code traces, GUI sessions, and multi-turn workflows. You’ll drive the data landscape required to advance capable, adaptable agents and help shape Labelbox’s strategy for collecting, synthesizing, and evaluating it. You will possess expertise in LLM agents and planning/execution loops, plus creativity in tackling problems across data design, interaction, and measurement. You’ll publish meaningful results, collaborate with customer researchers in frontier AI labs, and turn prototypes into reliable, scalable features.

Your Impact

Create frameworks and tools to construct, train, benchmark and evaluate autonomous agent capabilities.
Design agent-focused data programs using supervised fine-tuning (SFT) and reinforcement learning (RL) methodologies.
Develop data pipelines from diverse sources like code repositories, web browsers, and computer systems.
Implement and adapt popular open-source agent libraries and benchmarks with proprietary datasets and models.
Engage with research teams in frontier AI labs and the wider AI community to understand evolving agent data needs for frontier models and share best practices.
Collaborate closely with frontier AI lab customers to understand requirements and guide model development.
Publish research findings in academic journals, conferences, and blog posts.

What You Bring

Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or related field.
At least 3 years of experience addressing sophisticated ML problems with successful delivery to customers.
Experience building and training autonomous agents—tool use, structured outputs, multi-step planning—across browsers/GUI, codebases, and databases using SFT and RL.
Constructed and evaluated agentic benchmarks (e.g. SWE-bench, WebArena, τ-bench, OSWorld) and reliability/efficiency suites (e.g. WABER).
Adept at interpreting research literature and quickly turning new ideas into prototypes.
Deep understanding of frontier models (autoregressive, diffusion), post-training (SFT, RLVR, RLAIF, RLHF, et al.), and their human data requirements.
Proficient in Python, data science libraries and deep learning frameworks (e.g., PyTorch, JAX, TensorFlow).
Strong analytical and problem-solving abilities in ambiguous situations.
Excellent communication skills.
Track record of publications in top-tier AI/ML venues (e.g., ACL, EMNLP, NAACL, NeurIPS, ICML, ICLR, etc.).

Labelbox Applied Research

At Labelbox Applied Research, we're committed to pushing the boundaries of AI and data-centric machine learning, with a particular focus on advanced human-AI interaction techniques. We believe that high-quality human data and sophisticated human feedback integration methods are key to unlocking the next generation of AI capabilities. Our research team works at the intersection of machine learning, human-computer interaction, and AI ethics to develop innovative solutions that can be practically applied in real-world scenarios.

We foster an environment of intellectual curiosity, collaboration, and innovation. We encourage our researchers to explore new ideas, engage in open discussions, and contribute to the wider AI community through publications and conference presentations. Our goal is to be at the forefront of human-centric AI development, setting new standards for how AI systems learn from and interact with humans.