Member of Technical Staff - Model Evaluation

xAI • Full-time • bay area • 1w ago

ABOUT THE ROLE:

You will join the Enterprise Model Evaluation team to define and measure how Grok performs on high-value real-world tasks. Your focus will be on identifying critical enterprise use cases and building evaluations that capture performance on those use cases. In doing so, you will uncover model weaknesses, track performance over time, and work with the core modeling team to hill-climb.

Your work will directly shape the future of Grok's capabilities and intelligence. We're hiring intelligent, execution-focused people for this high-intensity environment—interstellar ambitions demand exceptional talent in a flat, low-overhead culture where smart people get shit done without bureaucracy.

RESPONSIBILITIES:

Identify high-value enterprise use cases
Provide complete assessment of models
Deep dive into model training and data to identify the weakness point revealed in evaluation
Communicate with modeling and data team to produce plans to improve model quality

BASIC QUALIFICATIONS:

Model assessment and evaluation task development (including public and in-house benchmarking)
Collect data and synthesize data for new evals
Build infrastructure and framework for easy-to-use model evaluation, familiarity with inference frameworks like SGlang and vLLM