JOBSEARCHER

Senior Quality Engineer - Evaluation Engineering (AI Models)

IncedoNew York, NYMay 24th, 2026
Position: Senior Evaluation Engineer (AI Models)Location: Remote (Must be local to New York City, NY / Fort Mill, SC / San Diego, CA)Employment Type: Full-TimeAbout the companyIncedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.Role OverviewWe are seeking an experienced Evaluation Engineer – AI Models to join our growing AI and Digital Engineering organization. This role is ideal for a senior Quality Engineering professional with hands-on experience assessing AI/ML and Generative AI model performance across both technical and business domains.The Evaluation Engineer will design and execute model evaluation strategies, validate LLM outputs, and partner closely with Data Science, Product, and Engineering teams to ensure our AI systems are accurate, safe, reliable, and production‑ready. Success in this role requires analytical depth, strong testing discipline, and the ability to communicate findings clearly to diverse stakeholders.Key ResponsibilitiesAI Model Evaluation — Design, develop, and execute evaluation strategies for AI/ML and Generative AI models.Model Output Validation — Assess accuracy, relevance, consistency, hallucination risk, bias, safety, and performance.Evaluation Framework Development — Create automated and manual evaluation frameworks for LLM-based applications.Test Case & Benchmark Design — Develop test cases, benchmarking approaches, and quality metrics for AI model validation.Cross-Functional Collaboration — Work with Data Scientists, Product Managers, and Engineers to improve model quality.Data Analysis — Analyze model behavior using structured and unstructured datasets.Regression & Continuous Validation — Perform regression testing and ongoing validation for model updates.Documentation & Reporting — Document findings, defects, risks, and recommendations for technical and business audiences.UAT & Production Support — Support UAT and production validation for AI-enabled products.Quality Governance — Contribute to QA best practices, automation strategies, and AI quality governance initiatives.Required Qualifications10+ years in Quality Assurance / Quality Engineering / Software Testing.2+ years hands-on experience in AI model evaluation, Generative AI testing, or ML validation.Strong understanding of AI/ML concepts, LLM behavior, prompt evaluation, and model testing methodologies.Experience with API testing, automation frameworks, and data validation techniques.Familiarity with evaluation metrics such as precision, recall, accuracy, grounding, relevance, and hallucination detection.Experience testing AI-powered applications, conversational AI, or GenAI platforms.Strong analytical and problem‑solving skills.Excellent verbal and written communication skills.Ability to work independently in a remote, cross-functional environment.Preferred SkillsExperience with Python and AI/ML testing tools or frameworks.Exposure to prompt engineering and RAG (Retrieval-Augmented Generation) validation.Knowledge of cloud platforms: AWS, Azure, or Google Cloud Platform.Experience working in Agile/Scrum environments.Financial Services or Wealth Management domain experience is a plus.EducationBachelor’s degree in Computer Science, Engineering, Information Systems, or a related field is required.