About the role
Anthropic’s Capacity team is looking for an Engineering Manager to own and manage cloud spend across a massively scaled, multi-cloud environment. You’ll work closely with research, engineering, and finance teams to ensure we have scalable systems for capacity management, high-quality data and insights for planning, and engineering roadmaps that deliver efficiency wins.
Responsibilities:
- Design, develop, and deliver capacity management systems for AI workloads on heterogenous infrastructure
- Build and maintain robust attribution of usage and enable in-depth data-driven insights that are actionable
- Build a deep understanding of research and training workloads to accurately forecast infrastructure needs
- Oversee design and implementation of forecasting tools and software systems for managing billions of dollars in spend
- Proactively identify efficiency opportunities and collaborate with teams across the org to increase effective capacity for Anthropic
- Partner closely with Finance and leadership, providing detailed and clear capacity inputs for financial planning and strategic decision making
You may be a good fit if you:
- Have experience managing $XXXM to $XB in infrastructure spend
- Have experience working with public clouds (AWS, GCP, Azure, etc.) and/or hybrid on-prem, cloud environments
- Have experience setting up capacity management systems that scale with growing organizations
- Are comfortable leveraging data and have experience building observability for complex systems
- Have strong interpersonal skills that enable you to influence and build cross-organizational support for capacity initiatives
- Have familiarity with LLMs and a deep interest in learning more about research and model training workloads
Strong candidates may also have some of the following:
- Past experience managing capacity for AI research and production workloads
- Past experience partnering with senior leadership, both technical and non-technical, to drive company-level reporting and decision making