{"schemaVersion":"jobsearcher.job.v1","id":"e902a88e4062520da0ff75fd","url":"https://jobsearcher.com/jobs/e902a88e4062520da0ff75fd","canonicalUrl":"https://jobsearcher.com/jobs/e902a88e4062520da0ff75fd","title":"Software Engineer, Evals","description":"There is no public benchmark for a customer's definition of quality. Evals turns accepted work, corrections, and failures into rubrics, golden sets, and routing decisions — the loop that makes every completed task improve the next one. You will build that loop.\r\nWhat you will do\r\nBuild the rubric engine that lets domain experts express what good looks like in terms they recognize, then scores every run against it.\r\nTurn accepted work into golden sets automatically, and corrections into labeled failure modes.\r\nDrive routing with eval results: which model, which context, which level of human review each task deserves.\r\nDesign the dashboards and review surfaces where customer experts judge, correct, and approve agent work.\r\nYou will thrive in this role if you\r\nHave worked on ML evaluation, data quality, or human-in-the-loop systems and know how easily metrics drift from meaning.\r\nAre rigorous about statistics but pragmatic about products — an eval no expert will use measures nothing.\r\nWant your work to be the company's answer to the question every enterprise buyer asks: how do you know it's right?\r\nSoftware Engineer, Evals\r\nJ-18808-Ljbffr","company":"Context","rawCompany":"context","city":"Millbrae","state":"CA","isRemote":false,"isActive":true,"createdAt":"2026-06-25T01:16:36.988Z","occupations":[],"industries":[],"jobPosting":{"@context":"https://schema.org","@type":"JobPosting","title":"Software Engineer, Evals","description":"There is no public benchmark for a customer's definition of quality. Evals turns accepted work, corrections, and failures into rubrics, golden sets, and routing decisions — the loop that makes every completed task improve the next one. You will build that loop.\r\nWhat you will do\r\nBuild the rubric engine that lets domain experts express what good looks like in terms they recognize, then scores every run against it.\r\nTurn accepted work into golden sets automatically, and corrections into labeled failure modes.\r\nDrive routing with eval results: which model, which context, which level of human review each task deserves.\r\nDesign the dashboards and review surfaces where customer experts judge, correct, and approve agent work.\r\nYou will thrive in this role if you\r\nHave worked on ML evaluation, data quality, or human-in-the-loop systems and know how easily metrics drift from meaning.\r\nAre rigorous about statistics but pragmatic about products — an eval no expert will use measures nothing.\r\nWant your work to be the company's answer to the question every enterprise buyer asks: how do you know it's right?\r\nSoftware Engineer, Evals\r\nJ-18808-Ljbffr","datePosted":"2026-06-25T01:16:36.988Z","dateModified":"2026-06-25T01:16:36.988Z","hiringOrganization":{"@type":"Organization","name":"Context","sameAs":"https://jobsearcher.com"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Millbrae","addressRegion":"CA","addressCountry":"US"}},"identifier":{"@type":"PropertyValue","name":"JobSearcher","value":"e902a88e4062520da0ff75fd"},"url":"https://jobsearcher.com/jobs/e902a88e4062520da0ff75fd"}}