About the Role
In this role, you will build and lead our forward-deployed engineering (FDE) team, working directly with leading labs and enterprises to design, build and deliver high quality datasets to support their most critical AI initiatives.
You’ll be responsible for establishing engineering best practices for data quality and validation. This includes designing innovative ML approaches to enhance human-in-the-loop (HITL) techniques and improving the efficiency of data generation and review processes. Your team will own systems and tools that enable consistent, scalable, and high-quality data delivery to our customers.
Sitting at the critical intersection of data engineering, ML engineering, and operations, you’ll partner closely with the DaaS Delivery team and cross-functional stakeholders to define quality standards, develop measurement frameworks, drive ML-based workflows to improve data pipelines and unblock projects through technical innovation. As the founding member, you’ll also roll up your sleeves to define and own the workflows and processes that are needed to deliver exceptional data at scale.
Main Responsibilities
- Build and lead the Forward Deployed Engineering DaaS organization, setting a clear vision, defining the operating model and scaling its impact across Snorkel’s Expert Data-as-a-Service workflows
- Build, mentor, and motivate high performing teams, including cultivating skills and culture needed to consistently deliver exceptional outcomes and transformative impact.
- Own and evolve the data pipeline components of the DaaS stack, including model-assisted labeling, quality estimation, and data-centric feedback loops that guide human input
- Partner with customers - including research and engineering teams at Frontier AI Labs - to scope requirements for complex, novel AI datasets and translate needs into delivery-ready workflows
- Establish and execute scalable processes for data generation and validation, quality measurement, and delivery-readiness across a range of annotation projects
- Develop robust systems for request intake, task orchestration, SLA tracking, and progress monitoring to ensure seamless execution and prevent critical delivery gaps
- Prototype and deploy LLM-based workflows to assess annotation quality, augment human review and data generation, and accelerate delivery timelines
- Collaborate cross-functionally with research and engineering teams to innovate, develop, and productionize HITL data generation methods, advanced quality techniques, and improve internal delivery tooling
- Drive continuous improvement by developing reusable workflows, surfacing operational insights, and enabling the organization to scale faster while maintaining high quality
What We’re Looking For
- 7+ years of experience in applied data or ML engineering roles, including 2+ years leading high-performing technical teams in hands-on management capacity
- Demonstrated success in customer facing roles, with a strong enthusiasm for data pipelines and LLM-based workflows.
- Proven track record of managing technical field teams in fast-paced, delivery-focused environments with competing priorities
- Experience as a player-coach—comfortable being hands-on while supporting and scaling the team
- Proven ability to thrive in fast-paced, ambiguous environments with cross-functional stakeholders
- Strong practical experience with LLM-based workflows, Python, SQL, and data tooling (e.g., pandas, Plotly, Streamlit, Dash)
- Bonus: experience working with labeling workflows or internal tooling for data delivery orgs
Compensation range for Tier 1 locations of San Francisco Bay Area and New York City, $220K - $350K OTE. All offers also include equity in the form of employee stock options. Our compensation ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
Why Join Snorkel AI?
At Snorkel AI, we're building the future of data-centric AI. Our Expert Data-as-a-Service organization partners with world-class customers to solve some of the hardest data challenges — creating training and evaluation data that power the next generation of LLMs and AI systems. You'll work directly on projects that impact real production systems, while shaping how internal teams deliver faster, better, and more intelligently. This is a rare opportunity to own technical data workflows and be a founding member of the technical DaaS team.
#LI-CG1