JOBSEARCHER

Senior Software Engineer & LLM Code Trainer

ARCHIVED
KakeMillbrae, CAMay 28th, 2026

We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.

We are looking for a Senior Software Engineer to contribute to the development and evaluation of AI training data for a leading expert human data platform for AI agents and LLMs.In this role, you will work at the intersection of software engineering and artificial intelligence, helping AI labs and companies build better, safer, and more capable models. You will leverage your deep technical expertise to write prompts, produce reference-quality code solutions, evaluate model outputs, and provide the structured human signal that makes AI systems smarter.This is not a traditional engineering role - it is a unique opportunity for senior engineers who want to shape how the next generation of AI understands, generates, and reasons about code.Key ResponsibilitiesCreate and review coding tasks based on real-world software engineering scenarios, including debugging, refactoring, code generation, API usage, automated tests, performance, security, and edge casesWrite high-quality reference solutions that are correct, clear, testable, and aligned with task requirementsEvaluate AI-generated code and responses using structured rubrics, assessing correctness, clarity, security, performance, maintainability, and instruction-followingCompare multiple model responses, select the strongest answer, and justify your decision with clear technical reasoningIdentify bugs, hallucinated APIs, missing edge cases, weak explanations, and poor engineering decisions in AI-generated outputsWork with terminal-based development workflows when needed, including running tests, debugging issues, managing dependencies, and navigating repositoriesFollow detailed guidelines consistently and participate in calibration activities to ensure high-quality, reliable evaluationsCore Requirements5+ years of professional software engineering experience in a backend, fullstack, or systems roleStrong proficiency in at least one core programming language, ideally Python, JavaScript/TypeScript, Go, Java, C++, or SQLHands-on experience with Terminal-Bench, with the ability to evaluate AI agent performance on terminal-based tasks including compiling code, running tests, managing environments, and completing multi-step software engineering workflowsComfortable working with Git, command line/terminal, and common development workflowsAbility to evaluate code critically - not only whether it works, but whether it is well-designed, secure, and maintainablePrior experience in AI data production, RLHF, data annotation, or LLM evaluation projectsExcellent written and verbal communication skills in EnglishAbility to work independently in a remote, asynchronous, fast-paced environmentHigh attention to detail and the ability to follow complex, rubric-based guidelines consistentlyNice-to-HaveExperience with Python-heavy workflows, automated testing frameworks, Docker, Linux, bash, or containerized environmentsExperience with repo-level code reasoning, large codebases, or open-source contributionsBackground in backend systems, data engineering, DevOps, infrastructure, security, or large codebaseAdditional US Timezone Overlap: PST (GMT -8)Please Note: Due to the high volume of applications, only shortlisted candidates will be contacted.We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.