Software Engineer, Scalability and Capability

Anthropic • Full-time • San Francisco, California, United States • 4d ago

About the role:

Our Scalability and Capability Inference team is responsible for building and maintaining the critical systems that serve our LLMs to a diverse set of consumers. As the cornerstone of our service delivery, the team focuses on scaling inference systems, ensuring reliability, optimizing compute resource efficiency, and developing new inference capabilities. The team tackles complex distributed systems challenges across our entire inference stack, from optimal request routing to efficient prompt caching.

You may be a good fit if you:

Have significant software engineering experience
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work

Strong candidates may also have experience with:

High performance, large-scale distributed systems
Implementing and deploying machine learning systems at scale
LLM optimization batching and caching strategies
Kubernetes
Python

Representative projects:

Optimizing inference request routing to maximize compute efficiency
Autoscaling our compute fleet to effectively match compute supply with inference demand
Contributing to new inference features (e.g. structured sampling, fine tuning)
Supporting inference for new model architectures
Ensuring smooth and regular deployment of inference services
Analyzing observability data to tune performance based on production workloads

Deadline to apply: None. Applications will be reviewed on a rolling basis.