ABOUT THE ROLE:
You will join the Enterprise Model Evaluation team to define and measure how Grok performs on high-value real-world tasks. Your focus will be on identifying critical enterprise use cases and building evaluations that capture performance on those use cases. In doing so, you will uncover model weaknesses, track performance over time, and work with the core modeling team to hill-climb.
Your work will directly shape the future of Grok's capabilities and intelligence. We're hiring intelligent, execution-focused people for this high-intensity environment—interstellar ambitions demand exceptional talent in a flat, low-overhead culture where smart people get shit done without bureaucracy.
RESPONSIBILITIES:
- Identify high-value enterprise use cases
- Provide complete assessment of models
- Deep dive into model training and data to identify the weakness point revealed in evaluation
- Communicate with modeling and data team to produce plans to improve model quality
BASIC QUALIFICATIONS:
- Model assessment and evaluation task development (including public and in-house benchmarking)
- Collect data and synthesize data for new evals
- Build infrastructure and framework for easy-to-use model evaluation, familiarity with inference frameworks like SGlang and vLLM