Founding Research Engineer
Research · Full-time · San FranciscoTriage is an applied research lab building adaptive security and safety alignment infrastructure for AI systems: defending against novel attack vectors and catching misalignment at inference-time in deployed systems. In December 2025, Triage raised a $1.5M pre-seed round led by BoxGroup.Our team is small, technical, and in-person. We like people who are truth-seeking, have strong takes on the work, and ship.About the RoleWe're hiring a Founding Research Engineer to drive the research agenda that makes our detection infrastructure work. Research at Triage is not separate from product. The sentries we train become the detectors our customers run in production. The attacks we discover become test cases in the eval harness. The misalignment failures we characterize in frontier models become the basis for per-tenant adapters that customers deploy.Active threads span post-training (reinforcement learning, DPO-based behavioral steering, preference modeling), adversarial red-teaming of frontier systems, chain-of-thought faithfulness and divergence detection, and small-model distillation for per-tenant deployment. What gets prioritized depends partly on you.What you'll doRun adversarial research on frontier model behavior: characterizing attack classes — instruction hijacking, CoT exfiltration, tool-call manipulation, output smuggling, emergent misalignment — and building the red-teaming infrastructure that converts those discoveries into training dataTrain, evaluate, and iterate on the small adversarially-trained models (sentries) that run per-tenant in production. End-to-end ownership of data, eval harness, training, and deployment feedback loops.Drive specific research bets on per-tenant DPO, CoT faithfulness, divergence-token detection, cross-model generalization, and preference modeling for customer-specific safety policiesPublish selectively — coordinated disclosures, technical writeups, open-source tooling, and peer-reviewed work where it serves the missionYou may be a fit ifYou've done LLM post-training work: RLHF, DPO, instruction tuning, preference modeling, adversarial training, or closely related. Shipped projects matter more than credentials.You have technical opinions about misalignment — why reward hacking, specification gaming, and deceptive reasoning emerge, and what signals can be used to detect themYou go from idea to working system in days, not monthsYou genuinely find breaking things fun. A large fraction of this job is finding out how frontier models fail in ways their developers did not anticipate.Strong writing — you can write a coordinated disclosure that platform teams take seriously and documentation that actually gets usedComfortable reading papers released yesterday, implementing them this week, and knowing when to ignore themCurrently pursuing a Bachelor's, Master's, or PhD, or prior industry experience in ML research or applied AI. Either works.Compensation$150K – $230K plus founding-level equity.ApplyingIf there appears to be a fit, we'll reach out to schedule 1–2 short technical conversations. After that, we'll arrange an on-site where you'll work on a small research problem, discuss ideas, and meet the team.