At Weights & Biases, our mission is to build the best tools for AI developers. We founded our company on the insight that while there were excellent tools for developers to build better code, there were no similarly great tools to help ML practitioners build better models. Starting with our first experiment tracking product, we have since expanded our solution into a comprehensive AI developer platform for organizations focused on building their own deep learning models and generative AI applications.
Weights & Biases is a Series C company with $250M in funding and over 200 employees. We proudly serve over 1,000 customers and more than 30 foundation model builders including customers such as OpenAI, NVIDIA, Microsoft, and Toyota.
The Senior Site Reliability Engineer (SRE) will report to the Enterprise Engineering Manager. This person will be responsible for setting up and maintaining infrastructure standards.
In addition, this role will play a pivotal role in tool development both externally and internally. You'll help make it possible to deploy our software to our enterprise customers, establishing a strong foundation of technical excellence for our diversified customer base.
This role will also establish firm partnerships with our enterprise customers, which will boost customer satisfaction and lead to more comprehensive solutions.
This team manages the variances in infrastructure types and implementing suitable solutions that cater to the unique needs of each enterprise customer. You will leverage your skills, knowledge, and adaptability to navigate this complex landscape, consistently providing high-quality solutions to our customers.
What you’ll achieve (Responsibilities)
- Set up and maintain infrastructure standards to ensure stable and smooth operations, supporting efficient functioning and laying the groundwork for meaningful improvements over time.
- Develop tools for both external and internal purposes. This involves building mechanisms to deploy our software to enterprise customers effectively, fostering a foundation of technical excellence.
- Troubleshoot and resolve issues related to operating Weights & Biases across different types of infrastructure
- Understand the nuances of various infrastructure types and implement suitable solutions catering to the unique needs of each enterprise customer
- Leverage technical skills, knowledge, and adaptability to effectively navigate the complex landscape of different infrastructures
What we’re looking for (Requirements)
- 5+ years of software development experience in an enterprise software environment
- Proficiency in GoLang programming.
- Deep understanding of distributed systems.
- Experience with Kubernetes required
- Knowledge of monitoring and scaling services for distributed systems (including Datadog, New Relic, Open Telemetry, Prometheus, etc)
- Familiarity with at least one major cloud provider (AWS, Azure, Google).
- Active collaboration with team members to align and achieve departmental goals.
Our Benefits
- 🏝️ Flexible time off
- 🩺 Medical, Dental, and Vision for employees and Family Coverage
- 🏠 Remote first culture with in-office flexibility in San Francisco
- 💵 Home office budget with a new high-powered laptop
- 🥇 Truly competitive salary and equity
- Supplemental benefits may be available depending on your location
We encourage you to apply even if your experience doesn't perfectly align with the job description as we seek out diverse and creative perspectives. Team members who love to learn and collaborate in an inclusive environment will flourish with us. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you need additional accommodations to feel comfortable during your interview process, reach out at careers@wandb.com.
#LI-Remote