Tech Stack
- Kubernetes
- Buildkite / ArgoCD
- Prometheus / Grafana / PagerDuty
- Pulumi / Terraform
- SGLang: This team is leading the development of one of the most popular open-source inference engines, SGLang (https://github.com/sgl-project/sglang/tree/main). You have the opportunity to work on open-source projects.
Location
The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.
Focus
- Ensure the reliability of the inference services
- Manage the contiguous deployment of inference services
- Benchmark and monitor the performance of the inference engines under different production workloads
- Build CI/CD infrastructure for endpoint deployment, image publishing, and inference engines
Ideal Experiences
- Worked on large-scale, high concurrent production serving.
- Worked on testing, benchmarking, and reliability of inference services.
- Worked on CI/CD infrastructure
Interview Process
After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15-minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:
- Coding assessment in a language of your choice.
- Systems hands-on: Demonstrate practical skills in a live problem-solving session.
- Project deep-dive: Present your past exceptional work to a small audience.
- Meet and greet with the wider team.
Our goal is to finish the main process within one week. All interviews will be conducted via Google Meet.
Annual Salary Range
$180,000 - $440,000 USD