Lead, Systems Quality and Reliability
Mission: You will own, build, and manage the RMA and FA debug and root-cause analysis for existing and new Groq AI / ML products. You will conduct tests, FA debug, and root-cause analysis.
Responsibilities & opportunities in this role:
- Conduct and lead debug and root-cause analysis of field RMAs. Collaborate with Systems Engineers, Hardware Engineers, Software Engineers and Operations Engineers as required.
- Scale Root Cause Failure Analysis capabilities within your organization.
- Create Failure Analysis result reports that align with standard 8D or similar processes
- Develop and optimize RMA testing strategy to improve timeliness and effectiveness of characterization process
- Analyze RMA, Failure Analysis, and Repair data. Identify trends and raise quality alerts when necessary. Drive resolution, containment, and mitigation plans for such quality alerts.
- Oversee hardware quality performance, monitoring field quality data and associated metrics including RMA Rates, MTBF, and Reliability Ratio.
- Manage operational performance of Failure Analysis at contract manufacturer(s), ensuring partner(s) achieve key performance indicators, including FA cycle times, fault duplication rates, and fault isolation rates.
- Drive learning’s from RMA / FA back into Manufacturing, Engineering, and Support teams.
- Oversee the set-up of new products into Failure Analysis operations.
Ideal candidates have/are:
- BS/MS in Electrical Engineering, Physics or a related degree
- 7+ years of hands-on systems test and/or validation engineering experience
- Proven hands-on management and leadership experience
- Competence using lab equipment such as oscilloscopes, logic analyzers, power analyzers, etc.
- Deeply cognizant of the differences between System test vs ATE test
- Experience with enabling reliability tests such as HTOL and quality tests such as Burn In.
- Ideal candidate will have working knowledge of Failure analysis techniques and tools such as FIB, SEM, TDR, VNA, and CSAM
- Ideal candidate will also have working knowledge of Fault Isolation techniques such as OBIRCH, DLS/LADA, LVP and LVI
- Proficiency with high speed interfaces (Serdes, PCIe, DDR)
- Experience testing power sub-sections (e.g. POLs, VRMs, etc.)
- Familiarity with lower speed interfaces like SPI, I2C, CAN bus, etc.
- Proficiency in Python, Perl, C++, or other languages on UNIX/Linux
- Experience in Failure Analysis for one (or more) of the following:
- Microprocessors, complex SOC devices, AI Systems, Servers, Network Systems
- Excellent knowledge of PCB card and system-level test and debug
- Able to manage factory floor partners (CM’s) for RMA / FA activities
Attributes of a Groqster:
- Humility - Egos are checked at the door
- Collaborative & Team Savvy - We make up the smartest person in the room, together
- Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
- Curious & Innovative - Take a creative approach to projects, problems, and design
- Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking
If this sounds like you, we’d love to hear from you!
Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $186,915 to $305,900, determined by your skills, qualifications, experience and internal benchmarks.
#LI-Remote