Tech Stack
- CUDA
- CUTLASS
- C/C++ and Python binding tools
Location
The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.
Focus
- Developing and improving low-level CUDA kernel optimizations for state-of-the-art inference and training software stack.
- Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight.
- Understanding GPU memory hierarchy and computation capabilities.
- Implementing the latest methods from the deep learning literature in low-level CUDA kernels.
- Innovating new ideas that bring us closer to the limits of a GPU.
Ideal Experiences
- Building high-performance GeMM CUDA kernels using Tensor cores or CUDA cores from scratch or by utilizing CuTe/CUTLASS.
- Implementing features for attention kernel by extending existing kernels or writing them from scratch.
- Comfortable with writing both forward and backward kernels and ensuring its correctness while considering floating point errors.
- Optimizing for both memory-bound and compute-bound operations.
- Reasoning about register pressure, shared-memory usage and GPU utilization through tools such as Nsight and removing bottlenecks.
- Being familiar with the latest and the most effective techniques in optimizing inference and training workloads.
- Using pybind to integrate custom-written kernels into a framework, specially JAX/XLA.
Interview Process
After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15-minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:
- Coding assessment in a language of your choice.
- Systems hands-on: Demonstrate practical skills in a live problem-solving session.
- Project deep-dive: Present your past exceptional work to a small audience.
- Meet and greet with the wider team.
Our goal is to finish the main process within one week. All interviews will be conducted via Google Meet.
Annual Salary Range
$180,000 - $440,000 USD