<Back to Search
Software Engineer
Millbrae, CAMarch 31st, 2026
Software EngineerYou'll be hands-on in improving the real-world behavior of our AI systems tracing and fixing runtime issues, building agent simulators, designing LLM evals and QA tools, and interfacing with client data. This is a role for builders who like prompt-level debugging, LLM system testing, and building infrastructure that improves our AI agents' performance.
You'll work across our AI agent platform writing prompts, debugging runtime issues, building agent simulation tooling, creating evals, interfacing with client data, and helping us monitor system behavior at scale. This is not a model training role it's an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production.
You will be working at the forefront of agentic AI, where you'll be pushing the boundaries of our agents' capabilities.
Some examples of what you might work on:
Trace and fix runtime bugs, then write regression tests.
Design evaluation datasets to simulate realistic workflows or red-team our system.
Build internal tooling for QA and agent simulation.
Normalize and transform messy client data for system integration.
Set up automatic testing and latency tracking infrastructure.
Create dashboards and observability tooling for agentic system behavior.
Expand on our existing eval & testing framework and agent simulation infrastructure.
Technical Skills
Proficiency in TypeScript
Strong generalist software engineer
Strong debugging skills. You can trace runtime failures, dig through logs, and pinpoint issues in async or multi-step agent systems.
Data transformation and ingestion. You can build pipelines to normalize and convert unstructured data for use in AI systems.
Strong understanding of system design, including distributed systems and reliability/performance tradeoffs
Experience using modern AI coding tools (e.g. Cursor, GitHub Copilot, Claude)
Excellent documentation and testing discipline
Proficiency with Git
Soft Skills
You care about improving agent behavior. This is an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production. You will be working at the forefront of agentic AI, where you'll be pushing the boundaries of our agents' capabilities.
You're high agency. AKA "agentic" ;) You can thrive with minimal structure. You are internally motivated. You proactively seek out ways to create value for your team.
You don't mind getting in the weeds. Improving agent performance requires diving deep into the details: identifying and understanding real-world edge cases, editing prompts to address them, and writing evals to cover them in the future. Sound exciting? You'll thrive. Sound tedious? You won't.
You're comfortable with ambiguity. You work well when specs are loose, or when the solution space spans prompts, code, and even a little RLHF.
You learn fast and move fast. You can pattern-match from past systems work and adapt to LLM-specific edge cases quickly.
We're looking for engineers with 2-7 years of experience who have worked closely with LLMs or AI agents in production systems. This is not a model R&D role it's about applying AI to real-world use cases: debugging behavior, designing evals, and building the infrastructure to scale intelligent systems.
You might be a strong fit if:
You've created internal tools or frameworks to support QA, evals, or agent simulation, and care about making complex systems observable and testable.
You've contributed to fast-paced product cycles involving AI behavior, latency, and user experience, and you're comfortable validating behavior by inspecting outputs, not just logs.
Nice to have:
Experience with multi-agent systems, TTS/NLP pipelines, or structured output validation.
Familiarity with testing frameworks, LangChain-style agent orchestration, or in-house eval harnesses.
Experience with prompt engineering, LLM evals, and agent orchestration. You're comfortable writing and refining prompts, crafting evals, and reasoning about LLM outputs.
68,254 matching similar jobs at Sargent Lundy
- Distributed Systems Software Engineer, GovCloud - Senior/Lead
- *Digital Identity Engineer
- Software Engineer: Build Scalable Solutions
- Senior Software Engineer
- Software Engineer - .NET & React - Banking, Accounting, & Finance - Analysts
- Software Engineer, Automation
- Senior Software Engineer - Build Scalable Banking Apps
- Slalom Flex (Project Based)- Golang Backend Software Engineer
- Software Engineer, Full Stack
- Contract System Software Engineer - Angular & DevOps
- Senior Software Development Engineer
- ProVideo Build Engineer
- Java Software Engineer
- Software Engineer Manager
- Functional Programming Software Engineer in Berkeley - Lutris Wireless
- Software Development Engineer in Test
- Python Automation Framework Engineer
- Lead Software Developer
- Lead Software Engineer - IOS & Android Native
- Senior Software Engineer, Applications
- Sr. Staff Mobile Software Engineer
- Senior Salesforce Integration Engineer
- Senior Software Engineer: Developer Experience & Delivery
- Staff Software Engineer, Marketing Systems
- Staff Software Engineer - Lead Cloud-Native Architecture
- Senior Infra Software Engineer - AI-Driven Microservices
- Sr Manager Software Engineering
- Senior DevOps Engineer: Azure, CI/CD & Automation Lead
- Senior Software Engineer
- Mobile Software Engineer
- Principal Software Engineer - OMS
- ME00482-Software Engineer 1 at Momentum Engineering Openings Chevy Chase, MD
- Manager - Software Engineering
- Software Engineer - ML Infrastructure
- Software Engineer at DocuSign, Inc. Seattle, WA
- Senior C++/Python Software Engineer - Real-Time Systems
- Senior Backend Engineer, Visit Experience
- Staff Software Engineer | Credit Cards
- GMS Software Application Engineer
- Senior Salesforce Engineer