Join the ML research team at Scale to pioneer synthetic and hybrid data creation and post-training research with an emphasis on the science of data. We are building innovative research frameworks to improve post-training data pipelines and evaluation methods for LLMs. This research forms the foundation for Scale’s ability to deliver high-quality, data-driven solutions that enhance model quality. Our work enables Scale to support the most advanced ML use cases, driving meaningful progress in capabilities evaluation and alignment for industry-leading customers. You’ll be working on cutting-edge research problems aimed at advancing post-training methodologies and evaluation science. Working at Scale will give you opportunities to collaborate with leading research teams and gain exposure to a wide range of challenges in machine learning.
Example Projects:
- Studying the boundaries of model generalization and capabilities to inform data-driven advancements.
- Research on synthetic data and hybrid data with humans in the loop to scale up high-quality data generation.
- Investigating strategies to refine and enhance data pipelines for model improvement.
- Researching and developing advanced evaluation methodologies for assessing model performance and alignment across diverse use cases.
- Advancing the understanding of human-AI collaboration through evaluation science and tooling development.
Required to have:
- Currently enrolled in a BS/MS/PhD Program with a focus on Machine Learning, Deep Learning, Natural Language Processing or Computer Vision with a graduation date in Fall 2025 or Spring 2026
- Prior experience or track record of research publications on LLMs, NLP, Multimodal, agents, or a related field
- Experience with one or more general purpose programming languages, including: Python, Javascript, or similar
- Ability to speak and write in English fluently
- Be available for a Summer 2025 (May/June starts) internship
Ideally you’d have:
- Have had a previous internship around Machine Learning, Deep Learning, Natural Language Processing, Adversarial Robustness, Alignment, Evaluation and Agents.
- Experience as a researcher, including internships, full-time, or at a lab
- Publications in top-tier ML conferences such as NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, ICCV, ECCV, etc. or contributions to open-source projects.