About the Role
As the Associate Site Operations Manager, you’ll oversee the data center technicians who keep xAI’s AI infrastructure running smoothly. This role is pivotal in ensuring our systems operate at peak efficiency, supporting the compute power behind our mission. You’ll co-lead a skilled team, manage critical operations, and implement smart, sustainable solutions. We’re looking for someone with technical expertise and a proactive approach to maintain and scale our facilities effectively.
Responsibilities
• Oversee Site Operations: Manage power, cooling, networking, and hardware deployments to ensure 99.999% uptime for xAI’s AI compute systems, keeping our infrastructure reliable and ready for innovation.
• Guide Your Team: Lead and develop a team of Data Center Operations Technicians through training, performance evaluations, and fostering a collaborative, high-performing environment tied to xAI’s objectives.
• Streamline Processes: Take charge of hardware lifecycles, incident resolution, and inventory management, refining procedures to ensure your team operates with precision and consistency.
• Connect Key Players: Coordinate between technicians, xAI’s AI specialists, and external vendors to integrate new technology and expand capacity seamlessly.
• Drive Sustainable Solutions: Champion energy-efficient practices and sustainability efforts, optimizing resources while supporting the demands of cutting-edge AI workloads.
• Measure Success: Track and report key metrics like uptime, power efficiency, and issue resolution times, using data to enhance site performance and inform decisions.
• Handle Emergencies: Lead the team through urgent situations with clear direction, resolving issues quickly to protect our AI systems from disruption.
• Optimize Operations: Build and refine processes—such as preventative maintenance schedules with vendors and ticket workflows in Jira—to keep operations efficient and scalable.
• Support Expansion: Work with leadership to standardize best practices across sites (if applicable), ensuring operations align with xAI’s ambitious growth plans.
Basic Qualifications
• 5+ years of experience in data center operations or similar critical environments, with 3+ years managing technical teams.
• Proven ability to lead teams effectively in fast-paced, high-responsibility settings.
• Solid expertise in server hardware, cabling, and data center technologies, from setup to lifecycle management.
Preferred Skills and Experience
• Experience supporting compute-heavy environments like AI, machine learning, or high-performance computing.
• Proficiency with tools like Jira and managing collaborative workflows across teams.
• Strong analytical skills and the ability to explain technical concepts clearly to diverse audiences.
• Familiarity with scripting (e.g., Python, Bash) to automate tasks and boost team efficiency.
• A history of partnering with vendors, scaling operations, and advancing sustainability initiatives.
• Enthusiasm for xAI’s mission to accelerate human discovery and unravel the universe.
Additional Requirements
• Ability to thrive in a dynamic, mission-focused environment with occasional on-call duties.
• Willingness to travel to data center locations as needed to support operations.
• Physical capability to handle data center tasks, including lifting up to 50 lbs, standing for long periods, and occasional ladder use