JOBSEARCHER

Founding RL Researcher

LanturnSanta Clara, CAApril 9th, 2026
Founding Research Scientist (Long-Horizon RL) at LanturnLocation: San Francisco (preferred) / Remote (US)Compensation: $300K base + 0.5–1% equityType: Full-time · Founding TeamAt Lanturn, we are building the next generation of reinforcement learning systems for real-world agents. Our focus is on enabling AI systems to learn from behavioral data and long-horizon workflows, through:High-fidelity RL environmentsSynthetic data generationClosed-loop training systemsWe are looking for a Founding RL Researcher to push the frontier of:Long-horizon RLEnvironment designPost-training for agentsAbout us: Lanturn is building the end-to-end behavioural learning stack for AI systems. We believe current approaches to RL and post-training are limited by short-horizon optimisation, weak or proxy reward signals, and a lack of grounded environments. Our approach is to build closed-loop RL systems where environments, data, training, and evaluation are tightly integrated and based on real-world behavioral data.The role:As a Founding RL Researcher, you will lead efforts to develop novel reinforcement learning algorithms and environments for training autonomous agents. You will work across:Algorithm designEnvironment modellingTraining systemsEvaluation frameworksThis role sits at the intersection of:Frontier Labs-style RL research (environments + algorithms)Modern LLM post-training (RLHF, preference optimisation)Key responsibilities:Design and implement RL systems for long-horizon tasks (10–100+ steps)Develop and extend modern post-training methods:PPO, DPO, ORPOGRPO / GRPO++ and ranking-based optimization methodsBuild RL environments grounded in real-world workflowsWork on meta-RL and adaptive learning systems:Generalization across tasksRapid adaptation to new environmentsDesign reward systems for:Behavioural correctnessEfficiency and robustnessDevelop evaluation frameworks aligned with real-world outcomesCollaborate with engineering teams to scale training systemsIdeal candidate:You are a researcher with strong theoretical grounding and real-world system intuition, capable of working on open-ended problems in RL. You thrive in environments where:Problems are not well-definedSystems must be built from first principlesResearch directly translates into deployed systemsMinimum qualifications:Experience at a top-tier AI lab or company: OpenAI, DeepMind, Anthropic, FAIR, or equivalentStrong background in reinforcement learning and post-training systemsExperience training large-scale models (LLMs or similar)Strong programming skills (Python, PyTorch/JAX)Preferred qualifications:Experience with long-horizon RL or sequential decision-making systemsExperience designing or working with RL environmentsFamiliarity with: Preference optimization (DPO, ORPO), RLHF pipelines, and automated RL env generationExperience with meta-RL / adaptive learning systemsStrong publication record in top-tier ML conferencesCore technical skills:Deep understanding of: Policy gradient methods (PPO and beyond), KL-regularized optimization, and credit assignment in long-horizon settingsExperience with: Cascading RL pipelines (SFT ? RL ? evaluation), distributed training systems, and stability and scaling challengesStrong intuition for: Exploration vs exploitation, reward shaping vs reward learning, and trajectory-level optimizationWhat makes this role unique ?Focus on long-horizon behavioral learning, not short-form RLHFTreats environment design and generation as a first-class problemOpportunity to define GRPO++-style next-generation algorithms and publish to NeurIPSWhy join Lanturn ?Founding ownership (0.5–1% equity)Work on unsolved problems in RL and agent systemsHigh autonomy and research freedomDirect impact on how real-world AI systems are trainedWork with second time founders directly who have worked with various big tech companies and enterprises.If you've worked on RL at a top lab or have had production RL experience and want to push beyond current paradigms into real-world, long-horizon intelligence, this is your opportunity.