Engineering Manager, Evaluation Platform
Job DescriptionEngineering Manager, Evaluation PlatformLocation: Austin, TX On-site (2 days per week hybrid in Austin office)Company: Procore (Construction Intelligence organization)Reports to: Sr Director, Procore AI EngineeringMachine Learning & Artificial IntelligenceJob SummaryBuild infrastructure and tooling to measure, benchmark, and improve the quality of AI agents (Search Agent, RFI Create Agent, Invoice Agent, etc.). Own end-to-end evaluation lifecycle: defining quality metrics, building evaluation frameworks, and delivering interfaces for actionable insights.What You'll DoLead and grow a team of engineers focused on evaluation infrastructure, quality measurement, and developer tooling for AI agents.Define technical vision and roadmap for the Evaluation Platform (offline evaluations and online evaluations).Partner with AI/ML, Product, and Agent teams to define quality metrics (relevance, accuracy, latency, safety, user satisfaction, token usage) and build automated pipelines.Design and deliver user-facing evaluation tools for assessing agent output quality, comparing model versions, and identifying regressions.Build frameworks for human-in-the-loop evaluation (annotation workflows, rating interfaces, inter-rater reliability).Establish CI/CD quality gates for agent version releases.Drive engineering excellence (code quality, system reliability, test coverage, on-call health, technical debt management).Recruit, mentor, and develop engineers, fostering a culture of ownership and rigorous experimentation.What We're Looking For5+ years managing engineering teams or as a technical lead, with 7+ years total in software engineering.Experience building evaluation, quality measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents).Strong understanding of evaluation methodologies (precision/recall, LLM-as-judge, human annotation, A/B testing, statistical significance).Proven ability to translate ambiguous problem spaces into clear technical strategies and executable roadmaps.Hands-on technical depth in backend systems, data pipelines, or distributed infrastructure (Python, Go, or similar).Familiarity with evaluation frameworks such as RAGAS, DeepEval, LangFuse, or custom eval harnesses.Background in search relevance (NDCG, MRR) or information retrieval quality systems.Experience with construction-tech, procurement, or enterprise B2B SaaS domains (preferred).Compensation & BenefitsBase Pay Range: $168,560.00 - $231,770.00 USD AnnualMachine Learning & Artificial IntelligenceEligible for Equity Compensation and/or Bonus Incentive Compensation. Actual compensation based on job-related skills, experience, education/training, and location.For Los Angeles County (unincorporated) Candidates: Procore will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with applicable laws.