JOBSEARCHER

Research Engineer - Interpretability Systems

🚨 Research Engineer – Interpretability Systems πŸ“ San Francisco, CA | Onsite 🧠 Early-stage AI research lab | Revenue-generating An AI research lab working at the frontier of interpretability, alignment, and reinforcement learning is hiring Research Engineers focused on understanding what’s happening inside large language models This role is for engineers who want to build the experimental systems that make interpretability research possible - not production ML, MLOps, or large-scale training infra You’ll work on: πŸ” Activation tracing & mechanistic analysis πŸ§ͺ Custom RL-style environments for alignment research 🧠 Probing internal representations 🎯 Detecting latent concepts like deception, goals, uncertainty, or hidden objectives πŸ› οΈ Activation-level steering beyond prompting and fine-tuning πŸ“Š New benchmarks for model consistency and robustness The work is fast, experimental, and greenfield: build custom tooling, test research ideas, get results, move on. Ideal background: βœ… Strong software engineering fundamentals βœ… Experience with experimental ML / research systems βœ… Comfort working close to model internals βœ… Interest in interpretability, alignment, RL, or mechanistic understanding βœ… PhD helpful, not required This is not a role for scaling pipelines or maintaining production systems It’s for people who enjoy ambiguous problems, fast research cycles, and building new tools from first principles Interested? Apply & Drop me a message!