Senior Software Engineer & LLM Code Trainer
Job DescriptionSenior Software Engineer - AI Training DataJob PostingPosition DetailsJob Title: Senior Software EngineerLocation: San Francisco, CA RemoteTime Zone Requirement: US Timezone Overlap: PST (GMT -8)Job SummaryThe company is looking for a Senior Software Engineer to contribute to the development and evaluation of AI training data for a leading expert human data platform for AI agents and LLMs. This unique role sits at the intersection of software engineering and artificial intelligence, helping companies build better, safer, and more capable models.Machine Learning & Artificial IntelligenceKey ResponsibilitiesCreate and review coding tasks based on real-world software engineering scenarios (debugging, refactoring, code generation, API usage, automated tests, performance, security, edge cases).Write high-quality reference solutions that are correct, clear, testable, and aligned with requirements.Evaluate AI-generated code and responses using structured rubrics (correctness, clarity, security, performance, maintainability, instruction-following).Compare multiple model responses, select the strongest answer, and justify decisions with technical reasoning.Identify bugs, hallucinated APIs, missing edge cases, weak explanations, and poor engineering decisions in AI outputs.Work with terminal-based development workflows (testing, debugging, managing dependencies, navigating repositories).Follow detailed guidelines consistently and participate in calibration activities to ensure high-quality evaluations.Core RequirementsExperience: 5+ years of professional software engineering experience in a backend, fullstack, or systems role.Programming Languages: Strong proficiency in at least one core language (Python, JavaScript/TypeScript, Go, Java, C++, or SQL).Tools: Hands-on experience with Terminal-Bench, Git, command line/terminal, and common development workflows.Evaluation Skills: Ability to evaluate code critically regarding design, security, and maintainability.AI Experience: Prior experience in AI data production, RLHF, data annotation, or LLM evaluation projects preferred.Communication: Excellent written and verbal communication skills in English.Work Style: Ability to work independently in a remote, asynchronous, fast-paced environment with high attention to detail.Nice-to-HaveExperience with Python-heavy workflows, automated testing frameworks, Docker, Linux, bash, or containerized environments.Experience with repo-level code reasoning, large codebases, or open-source contributions.Background in backend systems, data engineering, DevOps, infrastructure, security, or large codebases.