<Back to Search
Software Engineer
Millbrae, CAMarch 31st, 2026
Software EngineerYou'll be hands-on in improving the real-world behavior of our AI systems tracing and fixing runtime issues, building agent simulators, designing LLM evals and QA tools, and interfacing with client data. This is a role for builders who like prompt-level debugging, LLM system testing, and building infrastructure that improves our AI agents' performance.
You'll work across our AI agent platform writing prompts, debugging runtime issues, building agent simulation tooling, creating evals, interfacing with client data, and helping us monitor system behavior at scale. This is not a model training role it's an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production.
You will be working at the forefront of agentic AI, where you'll be pushing the boundaries of our agents' capabilities.
Some examples of what you might work on:
Trace and fix runtime bugs, then write regression tests.
Design evaluation datasets to simulate realistic workflows or red-team our system.
Build internal tooling for QA and agent simulation.
Normalize and transform messy client data for system integration.
Set up automatic testing and latency tracking infrastructure.
Create dashboards and observability tooling for agentic system behavior.
Expand on our existing eval & testing framework and agent simulation infrastructure.
Technical Skills
Proficiency in TypeScript
Strong generalist software engineer
Strong debugging skills. You can trace runtime failures, dig through logs, and pinpoint issues in async or multi-step agent systems.
Data transformation and ingestion. You can build pipelines to normalize and convert unstructured data for use in AI systems.
Strong understanding of system design, including distributed systems and reliability/performance tradeoffs
Experience using modern AI coding tools (e.g. Cursor, GitHub Copilot, Claude)
Excellent documentation and testing discipline
Proficiency with Git
Soft Skills
You care about improving agent behavior. This is an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production. You will be working at the forefront of agentic AI, where you'll be pushing the boundaries of our agents' capabilities.
You're high agency. AKA "agentic" ;) You can thrive with minimal structure. You are internally motivated. You proactively seek out ways to create value for your team.
You don't mind getting in the weeds. Improving agent performance requires diving deep into the details: identifying and understanding real-world edge cases, editing prompts to address them, and writing evals to cover them in the future. Sound exciting? You'll thrive. Sound tedious? You won't.
You're comfortable with ambiguity. You work well when specs are loose, or when the solution space spans prompts, code, and even a little RLHF.
You learn fast and move fast. You can pattern-match from past systems work and adapt to LLM-specific edge cases quickly.
We're looking for engineers with 2-7 years of experience who have worked closely with LLMs or AI agents in production systems. This is not a model R&D role it's about applying AI to real-world use cases: debugging behavior, designing evals, and building the infrastructure to scale intelligent systems.
You might be a strong fit if:
You've created internal tools or frameworks to support QA, evals, or agent simulation, and care about making complex systems observable and testable.
You've contributed to fast-paced product cycles involving AI behavior, latency, and user experience, and you're comfortable validating behavior by inspecting outputs, not just logs.
Nice to have:
Experience with multi-agent systems, TTS/NLP pipelines, or structured output validation.
Familiarity with testing frameworks, LangChain-style agent orchestration, or in-house eval harnesses.
Experience with prompt engineering, LLM evals, and agent orchestration. You're comfortable writing and refining prompts, crafting evals, and reasoning about LLM outputs.
62,079 matching similar jobs at Sargent Lundy
- Software Engineer, ML Infrastructure
- Senior Software Engineer, 5G
- Software Engineer
- Automation testing
- Senior Software Engineer, Core Experiences - Miami Gardens, USA
- Software Engineer
- Software Engineer (US)
- Senior Software Engineer, Backend
- Senior Software Engineer, AI Automations
- Senior Software Engineer, Frontend
- Software Engineer, Platform - Sunrise, USA
- Senior Software Engineer, Core Experiences - West Palm Beach, USA
- Software Engineer, Platform - Fort Lauderdale, USA
- Senior Software Engineer, Applied AI
- Founding Backend Engineer
- Senior Software Engineer
- Mission Software Engineer, Public Sector
- Senior Software Engineer
- Staff Software Engineer | Xbox Advertising
- Software Engineer: Backend & Infrastructure
- Lead Software Engineer - Full Stack
- Senior Software Engineer, Core Experiences - Fremont, USA
- Senior Software Engineer, Core Experiences - San Bernardino, USA
- Software Engineer - Clearance Required
- Senior Software Engineer, Mobile
- Senior Software Engineer
- Senior Software Engineer, Core Experiences - Pompano Beach, USA
- Software Engineer - Clearance Required
- Senior Software Engineer
- Software Engineer
- Senior Software Engineer, Backend Generalist
- Senior Software Engineer, Fullstack
- Senior Full-Stack Software Engineer
- Forward Deployed Software Engineer
- Software Engineer- Clearance Required
- Software Engineer- Clearance Required
- Software Engineer
- Senior Software Engineer
- Senior Software Engineer - Java/Python/SQL
- Staff Software Engineer