Senior Agentic AI-Harness Engineer/Researcher
Position: Agentic AI Engineer (Harness & Systems Focus)(Evaluation Harnesses | Autonomous Systems | Long-Running Agents)About the FirmWe are a highly reputable mid-sized investment management firm applying advanced AI to improve how investment decisions are researched and executed. Our work centers on building systems that perform reliably in noisy, dynamic, real-world environments—not controlled benchmarks.The OpportunityMost AI systems break down outside clean demos. This role focuses on the harder layer underneath:How do we engineer, validate, and stress-test agentic systems so they behave reliably over long horizons in uncertain environments?You’ll focus on harness engineering—the infrastructure, evaluation systems, and runtime scaffolding that make autonomous agents actually usable in production. This is less about isolated models and more about building the systems that prove they work.What You’ll DoDesign and build evaluation harnesses for long-running agent workflows (hours to days)Develop infrastructure to simulate complex, noisy environments for agent testingCreate metrics and validation frameworks where success is ambiguous or delayedEngineer agent runtime systems: orchestration, memory, retries, and failure handlingBuild tools for observability, debugging, and introspection of agent behaviorPartner with researchers to translate prototypes into reliable, testable systemsDevelop multi-agent test environments to evaluate coordination and failure modesContinuously stress-test systems against edge cases, drift, and adversarial conditionsWhat We’re Looking ForSenior/staff-level experience in AI/ML systems, infrastructure, or applied researchStrong background in systems engineering for AI, not just modelingExperience with evaluation harnesses, benchmarking, or simulation frameworksFamiliarity with long-running, stateful agent systems and their failure modesAbility to design robust testing strategies for non-deterministic systemsTrack record of delivering in ambiguous, high-impact environmentsDeep understanding of how models behave in production (not just in theory)Comfort owning loosely defined problems end-to-endWhy This RoleWork on the hardest unsolved layer of agentic AI: making it reliableDefine how systems are evaluated—not just how they’re builtBuild infrastructure that directly impacts real-world decision-makingOperate in a focused, low-bureaucracy environment with rapid iterationTackle problems where correctness isn’t obvious—and that’s the pointCompensationCompetitive with senior/staff-level roles at leading technology companies, with performance-based upside.