Model Evaluation & Data Quality Lead

chatgptMillbrae, CAMay 17th, 2026

Occupations:

Data ScientistsData Warehousing SpecialistsComputer and Information Systems ManagersSoftware DevelopersComputer and Information Research Scientists

Industries:

Other Professional, Scientific, and Technical ServicesEducational Support ServicesMedical and Diagnostic LaboratoriesNewspaper, Periodical, Book, and Directory PublishersManagement, Scientific, and Technical Consulting Services

Job Description : ML Data Team Lead - Twelve LabsML Data Team LeadLocation: San Francisco, CA RemoteCompany: Twelve LabsRole OverviewYou will be a vital member of the ML Data Team, leading the full spectrum of video-language data preparation and model evaluation. This role involves high ownership, defining dataset needs in consultation with research and product teams, designing and building data pipelines, and driving the post-training model evaluation strategy. You will focus on automating repetitive partnership, annotation, and quality evaluation work.Key ResponsibilitiesModel Evaluation: Design and build robust model evaluation frameworks, automating repetitive processes while maintaining efficiency and depth in obtaining metrics and feedback.Portfolio Monitoring: Manage resource allocation and timelines, adjusting direction flexibly based on real-time information across data streams.External Partner Collaboration: Enhance dataset and process quality through seamless collaboration with vendors and outsourcing partners.Data Quality & Tooling Advancement: Establish labeling guidelines, monitor data quality, and improve tools/infrastructure to build a sustainable data operations framework.Internal Collaboration: Partner with Engineering and AI Model teams to align on data needs, design analytical tools (reports/dashboards), and communicate project progress.QualificationsRequired:5+ years of experience in an AI-focused data operations organization.Proven track record designing and executing large-scale data or evaluation projects (gathering, labeling, post-processing).Ability to analyze complex data, identify patterns, and distill findings into crisp annotation guidelines or quality reports.Proficiency with Python, LLMs, or other industry tools for automation.Excellent communication and project management skills.Foundational understanding and interest in LLMs, VLMs, and multimodal AI.Strong belief that data is key to AI model performance and assessment.Preferred (Stand Out)Experience in data collection and labeling for multimodal language models.Experience in red teaming, localization testing, or evaluation-focused fields.Experience working with research scientists and engineers.Expertise or interest in video-centric domains (e.g., sports, advertising, content creation).Tech StackDevelopment & Analysis: Python (pandas, Jupyter, etc.)Data Management & Visualization: Amazon S3, Framework-agnostic visualization tools.Project Management: Linear, Notion.Compensation & BenefitsSalary Range: $150,000 - $160,000 USDOpen and inclusive work culture.Work closely with a mission-driven team on cutting-edge AI technology.Full health, dental, and vision benefits.Flexible PTO and parental leave policy.Office closed the week of Christmas and New Years.

Model Evaluation & Data Quality Lead

matching similar jobs near Millbrae, CA