AI Training and Inference is a core pillar of Meta's success. To achieve Meta's AI goals the network infrastructure, from the networking software stack through to the network switches, must operate with a high level of reliability. Production Engineers play an key role in driving the reliability of this network by deep diving production issues through the entire stack and building software systems to ensure that operations can be scaled appropriately. To support delivering on these goals, Production Engineering Managers play a critical role in supporting and growing the organization to ensure the success of shared goals across the domain.
Manager, Production Engineering(Network) Responsibilities:
- Support and lead engineers who are responsible for reliably scaling Meta's AI/HPC networking operations.
- Partner with teams across Meta's AI/HPC environment to ensure alignment on operational priorities and approaches across the domain
- Understand and contribute to technical architectures, capacity plans, tooling needs, automation plans, product launch plans and create comprehensive plans for prioritizing technical and resourcing challenges.
- Drive technical architecture discussions, even on subjects you haven't had direct experience working with.
- Help define and drive a technical roadmap to meet organizational objectives.
- Help engineers develop their careers, assigning them to projects tailored to their skill levels, long-term skill development, personalities, and work styles.
- Help build and enrich an inclusive work environment comprised of people from diverse backgrounds.
- Assess employee performance frequently, address under-performance, and recognize and promote performance.
- Balance the need to “keep things running” with allocating time to long-term, high-impact projects
Minimum Qualifications:
- 4+ years of direct management experience in a technology role
- BS or MS in Computer Science, Engineering, or a related technical discipline, or equivalent experience
- Experience with operating, designing, implementing and troubleshooting servers and networking components.
- Experience drafting and reviewing code
- Experience with building teams and/or organizations, including hiring and managing performance
- Experience working in a cross functional domain with high collaboration demands.
Preferred Qualifications:
- Expert knowledge of data center networking concepts (routing, switching, etc.).
- Experience operating an IB/RDMA/RoCE network in production.
- Understanding of host side communication libraries which enable running AI training workloads.
- Experience building infrastructure automation software.
- Experience in efficiently coding in at least one programming language.
About Meta:
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
$177,000/year to $251,000/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.