About the role
As an early member of our Data Science team, you will play a crucial role in ensuring our AI systems deliver exceptional user experiences through reliable, low-latency performance. You'll be at the intersection of data science and infrastructure, using rigorous analysis to understand how platform performance impacts user behavior and identifying high-impact opportunities to improve our systems' reliability and responsiveness.
Your work will directly influence how millions of users experience Claude and our other AI systems. You'll quantify user sensitivity to latency, reliability, errors, and refusal rates, then translate these insights into actionable recommendations that drive meaningful improvements to our platform infrastructure. This role offers the unique opportunity to shape the technical foundation that enables safe, frontier AI to scale globally.
Responsibilities:
- Design and execute comprehensive analyses to understand how latency, reliability, errors, and refusal rates affect user engagement, satisfaction, and retention across our platform
- Identify and prioritize high-impact infrastructure improvements by analyzing user behavior patterns, system performance metrics, and the relationship between technical performance and business outcomes
- Develop robust methodologies to measure platform reliability and performance, including defining key metrics, establishing baselines, and creating monitoring systems that enable proactive optimization
- Collaborate with engineering teams to design A/B tests and controlled experiments that measure the impact of platform improvements on user experience and system performance
- Investigate performance anomalies, conduct root cause analysis of reliability issues, and provide data-driven insights to guide engineering priorities and architectural decisions
- Work closely with Platform Engineering, Product, and Research teams to translate technical performance data into user experience insights and strategic recommendations
- Build models to forecast platform capacity needs, predict potential reliability issues, and optimize resource allocation to maintain optimal performance at scale
- Present complex technical analyses and recommendations to both technical and non-technical stakeholders, including engineering leadership and executive teams
You may be a good fit if you have:
- Advanced degree in Statistics, Computer Science, Engineering, Mathematics, or related quantitative field, with 5+ years of hands-on data science experience
- Deep understanding of distributed systems, cloud infrastructure, and performance engineering, with experience analyzing large-scale system metrics
- Expertise in experimental design, causal inference, statistical modeling, and A/B testing frameworks, particularly in high-scale technical environments
- Strong skills in Python, SQL, and data analysis tools, with experience working with large datasets and real-time streaming data
- Experience translating technical performance metrics into user experience insights, including understanding how system performance affects user engagement and satisfaction
- Proven ability to work effectively with engineering teams and translate complex technical analyses into actionable recommendations for diverse audiences
- Track record of using data science to drive significant improvements in system performance, user experience, or business outcomes
Strong candidates may also have:
- Hands-on experience with observability tools, APM systems, and infrastructure monitoring platforms (e.g., Prometheus, Grafana, DataDog)
- Experience with machine learning infrastructure, model serving, and understanding the unique performance characteristics of AI/ML systems
- Familiarity with SRE practices, error budgets, SLOs/SLIs, and reliability engineering principles
- Experience analyzing performance of real-time or near-real-time systems, including understanding of latency distributions and tail behavior
- Background in user behavior analysis, growth metrics, or product analytics, particularly in understanding how technical performance drives user outcomes
- Direct experience working with platform or infrastructure teams in high-scale technology environments