Meta is seeking a forward thinking engineer to join the Production Operations team within our Data Centers. These Data Centers are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Meta is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast paced, technical environment where adaptability and flexibility will be key to their success. We seek an IT professional with advanced, hands-on technical skills in server hardware and Linux. The candidate must be experienced and passionate in influencing others and providing direct mentorship and up-levelling opportunities to others. Having extensive knowledge of server administration and performing on complex projects in a large-scale distributed data center environment is a core competency of this individual. Further, the candidate should also have knowledge and experience in a few of the following core areas: Data Center Infrastructure, Hardware repair, OS management, Tooling and Automation, Networking, Facilities Operations, or Technical Project Management.
SiteOps Data Center Production Operations Engineer Responsibilities:
- Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls.
- Motivate and support team members through identified growth opportunities, champion a positive attitude and work to instill positive team behaviors.
- Perform root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues.
- Support the geographical area and local point of contact on the introduction of new platforms and hardware to the site and area, accelerating the time it takes to bring these products to sustained mass production.
- Be the Production Operations subject-matter expert with cross-functional teams and external vendors on large scale data center projects and initiatives.
- Lead collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation to improve global datacenter operations.
- Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers. Take actions to communicate with all stakeholders appropriately and manage or escalate as needed.
- Drive corrective actions by working with internal hardware teams and vendors to help drive complex technical issues to resolution, provide an ownership stake in ensuring high quality levels of hardware, and influence future design to ensure ease of serviceability.
- Utilize expert technical and mentorship skills to enable others in solving complex and systemic hardware and/or software issues at scale.
- Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency throughout the data center.
- Use data analytics to drive maximum server fleet up-time and utilization rates by understanding hardware failure rates and SLAs to customers. Identify trends and systemic issues in the fleet to drive resolution.
- Maintain and update documentation i.e. procedures, runbooks and guides. Has the technical expertise, while understanding the needs of the organization, to lead efforts to develop, facilitate and improve upon org level technical training.
- Serve as an escalation point for the local Site On Call Engineer, with participation levels in the on-call rotation varying by site.
- Travel up to 15% of the time.
Minimum Qualifications:
- BS, BA or BEng in technical field or commensurate experience.
- 10+ years of technical IT experience within a Data Center environment, in a role such as Lead Engineer, Systems Administrator, DevOps Engineer, or Site Reliability Engineer.
- Experience leading technical projects related to areas such as process improvement, technology, and/or automation. Brings peers, partners, and other resources into the project where additional expertise is needed, and to provide growth and learning opportunities for others.
- Expert in Linux in a complex IT environment with the capacity to triage, debug, and troubleshoot complex, systemic issues.
- Extensive hands-on experience and knowledge of server hardware and components, including storage.
- Expert knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network.
- Experience managing multiple technical issues concurrently driving to the root cause.
- Capacity to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience. Clearly explains technical problems with data and analysis, and provides detailed feedback and solutions.
- Experience in debugging, modifying and developing commonly used scripting or programming languages in at least one major language and orchestration system such as: Bash, PHP, Python, SQL, Rust, Go, Puppet, Chef, or Ansible.
- Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console.
Preferred Qualifications:
- Experience with large-scale AI implementations.
- Six Sigma knowledge/certification.
- PMP or equivalent project portfolio experience.
- Previous direct people leadership experience.
About Meta:
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
$62.98/hour to $186,000/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.