*Note: This position requires presence in our San Francisco office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.
What You’ll Do
- Guide new customers through the technical onboarding process by:
- Assisting ML researchers in migrating their existing workloads to Lambda’s AI Cloud Platform, ensuring that expected performance is achieved
- Providing initial troubleshooting for technical issues that arise during the first few days of customers time on Lambda infrastructure
- Collaborate closely with customers to understand their needs and objectives, offer tailored guidance and best practices for deploying models and managing GPU infrastructure
- Demonstrate how to optimize and scale training and inference workloads within Lambda by:
- Building proof-of-concept demos
- Creating detailed architecture diagrams
- Create and maintain detailed documentation including technical guides, best practices and troubleshooting resources
- Conduct training sessions and workshops for customers, enabling them to effectively utilize Lambda’s products and services
- Facilitate smooth workload transitions between Lambda’s various products
- Drive customer growth by identifying opportunities to increase product adoption
- Act as a trusted advisor to new customers, ensuring successful integration and optimization of Lambda products
- Provide continuous customer feedback to influence product roadmap and enhancements
- Serve as a link between customers and internal teams
You
- Have experience in machine learning or data science with a deep understanding of model development, and deployment
- Have experience using deep learning frameworks and libraries such as PyTorch, Tensorflow, Deepspeed, etc.
- Have experience with containerization technologies such as Docker and Kubernetes
- Have experience building and optimizing LLM-based applications
- Have experience building end-end ML pipelines on major cloud platforms
- Have experience with Linux systems administration
- Are an excellent communicator, capable of explaining complex, technical concepts to technical and non-technical audiences
- Are customer obsessed, and strive to deliver exceptional experiences to current and future Lambda customers
- Experience as an ML educator and/or building and executing customer training sessions, product demos or workshops
Nice to Have
- Experience using MLOps tools such as RunAI, Weights and Biases, ClearML
- Experience in training large models using distributed systems
- Selecting parallelism strategies
- Multi-GPU and Multi-Node training
- Troubleshooting and configuring NCCL/RDMA
- Quantization
- Experience with HPC orchestration technologies such as SLURM
- Experience with automation tools like Ansible, Puppet, Salt
Salary Range Information
Based on market data and other factors, the salary range for this position is $144,000 - $210,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.