Research Engineer - Interpretability Systems (Santa Clara)
Research Engineer – Interpretability SystemsSan Francisco, CA | OnsiteEarly-stage AI research lab | Revenue-generatingAn AI research lab working at the frontier of interpretability, alignment, and reinforcement learning is hiring Research Engineers focused on understanding what's happening inside large language modelsThis role is for engineers who want to build the experimental systems that make interpretability research possible - not production ML, MLOps, or large-scale training infraYou'll work on:Activation tracing & mechanistic analysisCustom RL-style environments for alignment researchProbing internal representationsDetecting latent concepts like deception, goals, uncertainty, or hidden objectivesActivation-level steering beyond prompting and fine-tuningNew benchmarks for model consistency and robustnessThe work is fast, experimental, and greenfield: build custom tooling, test research ideas, get results, move on.Ideal background:Strong software engineering fundamentalsExperience with experimental ML / research systemsComfort working close to model internalsInterest in interpretability, alignment, RL, or mechanistic understandingPhD helpful, not requiredThis is not a role for scaling pipelines or maintaining production systemsIt's for people who enjoy ambiguous problems, fast research cycles, and building new tools from first principlesInterested? Apply & Drop me a message!