<Back to Search
Software Engineer
Millbrae, CAMarch 31st, 2026
Software EngineerYou'll be hands-on in improving the real-world behavior of our AI systems tracing and fixing runtime issues, building agent simulators, designing LLM evals and QA tools, and interfacing with client data. This is a role for builders who like prompt-level debugging, LLM system testing, and building infrastructure that improves our AI agents' performance.
You'll work across our AI agent platform writing prompts, debugging runtime issues, building agent simulation tooling, creating evals, interfacing with client data, and helping us monitor system behavior at scale. This is not a model training role it's an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production.
You will be working at the forefront of agentic AI, where you'll be pushing the boundaries of our agents' capabilities.
Some examples of what you might work on:
Trace and fix runtime bugs, then write regression tests.
Design evaluation datasets to simulate realistic workflows or red-team our system.
Build internal tooling for QA and agent simulation.
Normalize and transform messy client data for system integration.
Set up automatic testing and latency tracking infrastructure.
Create dashboards and observability tooling for agentic system behavior.
Expand on our existing eval & testing framework and agent simulation infrastructure.
Technical Skills
Proficiency in TypeScript
Strong generalist software engineer
Strong debugging skills. You can trace runtime failures, dig through logs, and pinpoint issues in async or multi-step agent systems.
Data transformation and ingestion. You can build pipelines to normalize and convert unstructured data for use in AI systems.
Strong understanding of system design, including distributed systems and reliability/performance tradeoffs
Experience using modern AI coding tools (e.g. Cursor, GitHub Copilot, Claude)
Excellent documentation and testing discipline
Proficiency with Git
Soft Skills
You care about improving agent behavior. This is an applied systems position focused on behavior, infrastructure, and debugging real-world agents in production. You will be working at the forefront of agentic AI, where you'll be pushing the boundaries of our agents' capabilities.
You're high agency. AKA "agentic" ;) You can thrive with minimal structure. You are internally motivated. You proactively seek out ways to create value for your team.
You don't mind getting in the weeds. Improving agent performance requires diving deep into the details: identifying and understanding real-world edge cases, editing prompts to address them, and writing evals to cover them in the future. Sound exciting? You'll thrive. Sound tedious? You won't.
You're comfortable with ambiguity. You work well when specs are loose, or when the solution space spans prompts, code, and even a little RLHF.
You learn fast and move fast. You can pattern-match from past systems work and adapt to LLM-specific edge cases quickly.
We're looking for engineers with 2-7 years of experience who have worked closely with LLMs or AI agents in production systems. This is not a model R&D role it's about applying AI to real-world use cases: debugging behavior, designing evals, and building the infrastructure to scale intelligent systems.
You might be a strong fit if:
You've created internal tools or frameworks to support QA, evals, or agent simulation, and care about making complex systems observable and testable.
You've contributed to fast-paced product cycles involving AI behavior, latency, and user experience, and you're comfortable validating behavior by inspecting outputs, not just logs.
Nice to have:
Experience with multi-agent systems, TTS/NLP pipelines, or structured output validation.
Familiarity with testing frameworks, LangChain-style agent orchestration, or in-house eval harnesses.
Experience with prompt engineering, LLM evals, and agent orchestration. You're comfortable writing and refining prompts, crafting evals, and reasoning about LLM outputs.
62,079 matching similar jobs in Shell Valley, ND
- Java Software Engineer
- ProVideo Build Engineer
- Software Engineer, Full Stack
- Functional Programming Software Engineer in Berkeley - Lutris Wireless
- Software Development Engineer in Test
- SAP BTP Staff Software Engineer
- Lead Software Engineer - IOS & Android Native
- Senior Software Engineer, Applications
- Senior Software Quality Engineer
- Sr. Staff Mobile Software Engineer
- Staff Software Engineer, Marketing Systems
- Lead Software Developer
- Senior Salesforce Integration Engineer
- Senior Software Engineer: Developer Experience & Delivery
- Senior Infra Software Engineer - AI-Driven Microservices
- Software Engineer, Marketing Automation
- Senior Software Engineer
- Senior DevOps Engineer: Azure, CI/CD & Automation Lead
- Principal Software Engineer - OMS
- Sr Manager Software Engineering
- Manager - Software Engineering
- Software Engineer (R&D Engineer 1/2)
- Mobile Software Engineer
- Software Engineer at DocuSign, Inc. Seattle, WA
- Test Automation
- GMS Software Application Engineer
- Software Engineer - ML Infrastructure
- Senior C++/Python Software Engineer - Real-Time Systems
- Software Engineer, ARC Team
- Princ Engr-Software Devt
- Senior Backend Engineer, Visit Experience
- Staff Software Engineer | Credit Cards
- Senior Software AI Engineer
- Senior Test Automation Engineer
- Senior Software AI Engineer
- Software Engineer – Procurement Applications (Coupa)
- Senior Software Ai Engineer
- Staff Software Engineer
- Senior Test Automation Engineer
- Senior AWS Delivery Consultant — Conversational AI