About the team
xAI is committed to building artificial intelligence that uncovers the truth about the Universe. We believe computational power is the cornerstone of achieving this vision. Our goal is to harness a significant portion of energy on the Kardashev Scale (Type I) to drive efficient computing, advancing scientific discovery, improving quality of life, and supporting humanity’s journey toward becoming a multi-planetary civilization (Type II).
Current AI systems are approximately one million times less efficient than their theoretical limit. Our team’s mission is to pioneer next-generation AI systems that achieve breakthrough efficiency and scalability across new hardware, compilers, and models.
About the role
As part of xAI and in collaboration with our partners, you will co-design next-generation AI systems, spanning from silicon to software compilers to models. This complex role involves multiple levels of responsibility, including:
- Optimizing front-end compilers to transform training and inference workloads into compute graphs for new AI hardware, estimating their performance.
- Writing PTX-level or lower-level codes for critical kernels to maximize performance on cutting-edge microarchitectures.
- Designing and refining new hardware architectures to push the boundaries of computational efficiency.
- Leveraging AI-driven approaches to revolutionize software compiler development and hardware design processes.
Tech Stack
- Programming Languages: Python, Rust, C++
- Kernel Coding: CUDA, Triton, PTX, other accelerator-specific Assemblies
- Modularized Compiling Tools: OpenXLA, MLIR, LLVM
- Hardware Descriptions: VHDL, Verilog, other accelerator-friendly hardware design languages, like Chisel
Ideal Experience
- Proficiency in coding for accelerators, including CUDA, cutlass, HIP, ROCm, Triton, and other assembly languages or DSLs.
- Experience simulating training workloads on novel AI hardware architectures.
- Experience with compiler toolchain development and a deep understanding of the advantages and disadvantages of modular tools such as OpenXLA, MLIR, and LLVM.
- Hands-on experience with distributed training and/or high-QPS production inference.
- Deep understanding of the microarchitecture of AI accelerators and GPUs.
- R&D experience in using AI for software compiler and/or hardware design.
Location
This role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to reside in or be willing to relocate to the Bay Area.
Interview Process
After submitting your application, our team will review your CV and statement of exceptional work. If selected, you’ll be invited to a 15-minute phone interview, where a team member will ask introductory questions. Successful candidates will proceed to the main interview process, which includes four technical interviews:
- Coding Assessment: Complete a coding task in a language of your choice.
- Systems Hands-On: Demonstrate practical skills in a live problem-solving session.
- Project Deep-Dive: Present your past exceptional work to a small audience.
- Meet-and-Greet: Connect with members of the broader team.
Our goal is to complete the main interview process within one week. We don’t use recruiters for assessments—every application is evaluated by a member of our technical team. All interviews will be conducted via Google Meet.
Annual Salary Range
$180,000 - $500,000 USD