Machine Learning Research Engineer, Agent Applications

Scale AI • Full-time • San Francisco, California, United States • 2w ago

About Scale

At Scale AI, our mission is to accelerate the development of AI applications. For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including: generative AI, defense applications, and autonomous vehicles. With our recent Series F round, we’re accelerating the abundance of frontier data to pave the road to Artificial General Intelligence (AGI), and building upon our prior model evaluation work with enterprise customers and governments, to deepen our capabilities and offerings for both public and private evaluations.

About This Role

This role is at the intersection of cutting-edge AI research and practical application, with a focus on studying the data types essential for building state-of-the-art agents, such as browser and SWE agents. The ideal candidate will explore the data landscape needed to advance intelligent, adaptable AI agents, guiding the data strategy at Scale to drive innovation. This position requires not only expertise in LLM agents and planning algorithms but also creativity in addressing novel challenges related to data, interaction, and evaluation. You will contribute to impactful research publications on agents, collaborate with customer researchers, and work alongside the engineering team to translate these advancements into real-world, scalable solutions.

Ideally you’d have:

Practical experience working with LLMs, with proficiency in frameworks like Pytorch, Jax, or Tensorflow. You should also be adept at interpreting research literature and quickly turning new ideas into prototypes.
A track record of published research in top ML venues (e.g., ACL, EMNLP, NAACL, NeurIPS, ICML, ICLR, COLM, etc.)
At least three years of experience addressing sophisticated ML problems, either in a research setting or product development.
Strong written and verbal communication skills and the ability to operate cross-functionally.

Nice to have:

Hands-on experience with open source LLM fine-tuning or involvement in bespoke LLM fine-tuning projects using Pytorch/Jax.
Hands-on experience and publications in building evaluations and benchmarks related to AI agents such as tool-use, text2SQL, browser agents, coding agents and GUI agents.
Hands-on experience with agent frameworks such as OpenHands, Swarm, LangGraph, etc.
Familiarity with agentic reasoning methods such as STaR and PLANSEARCH
Experience working with cloud technology stack (eg. AWS or GCP) and developing machine learning models in a cloud environment.

Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organizational culture. We will not ask any LeetCode-style questions.