Senior/Staff Research Scientist, Frontier Benchmarks

Snorkel • Full-time • Redwood City, CA (Hybrid); San Francisco, CA (Hybrid); United States (Remote) • 3w ago

ABOUT THE ROLE

We're looking for a Staff or Senior Research Scientist to collaborate with partners and lead the development of the next frontier benchmarks and datasets. This is a highly visible, customer-facing role at the intersection of research, company strategy, and go-to-market. You'll design datasets taking into account frontier model performance and work with our academic partners, and then partner with delivery, product and go-to-market to scale out production. You will also serve as a credible technical partner for our customers, prospects, and drive results that impact the broader research community.

This role reports directly to the Head of Research and is ideal for someone who is energized by cross-functional work and wants to understand how startups operate across research, data operations, and commercial teams.

MAIN RESPONSIBILITIES

Design state of the art datasets that drive frontier model training and evaluation based on current model performance and academic partnerships
Translate benchmark insights into clear, compelling narratives that articulate the ROI of expert-curated data for customer-facing presentations, technical reports, and go-to-market materials.
Work cross-functionally with data operations, product, engineering, and strategy to surface research findings that inform the company roadmap.
Stay at the frontier of LLM evaluation research and bring best practices into Snorkel's workflows
Represent Snorkel's research externally through publications, blog posts, conference talks, and customer engagements that advance the conversation around data-centric AI

PREFERRED QUALIFICATIONS

Strong research background in AI/ML evaluation, NLP, or related fields, with a track record of rigorous experimental design — especially around measuring the impact of training and evaluation data on model behavior.
Exceptional communication skills — able to present complex technical findings clearly to both technical and non-technical audiences
Comfort operating in a fast-moving, cross-functional environment with ambiguous problem spaces
Genuine interest in GTM strategy, startup dynamics, and the commercial side of AI data services.
Ph.D. in machine learning, NLP, or a related field preferred; equivalent industry or research lab experience considered.