About the role
Anthropic is seeking a Research Engineer/Scientist to join the Science of Scaling team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems. You'll contribute across the entire stack, from low-level optimizations to high-level algorithm and experimental design, balancing research goals with practical engineering constraints.
Responsibilities:
- Conduct research intro the science of converting compute into intelligence
- Independently lead small research projects while collaborating with team members on larger initiatives
- Design, run, and analyze scientific experiments to advance our understanding of large language models
- Optimize training infrastructure to improve efficiency and reliability
- Develop dev tooling to enhance team productivity
You may be a good fit if you:
- Have significant software engineering experience and a proven track record of building complex systems
- Hold an advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field
- Are proficient in Python and experienced with deep learning frameworks
- Are results-oriented with a bias towards flexibility and impact
- Enjoy pair programming and collaborative work, and are willing to take on tasks outside your job description to support the team
- View research and engineering as two sides of the same coin, seeking to understand all aspects of the research program to maximize impact
- Care about the societal impacts of your work and have ambitious goals for AI safety and general progress
Strong candidates may have:
- Experience with JAX
- Experience with reinforcement learning
- Experience working on high-performance, large-scale ML systems
- Familiarity with accelerators, Kubernetes, and OS internals
- Experience with language modeling using transformer architectures
- Background in large-scale ETL processes
- Experience with distributed training at scale (thousands of accelerators)
Strong candidates need not have:
- Experience in all of the above areas — we value breadth of interest and willingness to learn over checking every box
- Prior work specifically on language models or transformers; strong engineering fundamentals and ML knowledge transfer well
- An advanced degree — exceptional engineers with strong research instincts are equally encouraged to apply