<Back to Search
Applied Reinforcement Learning Engineer
Redmond, WAMarch 30th, 2026
About CentificCentific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystem-comprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 markets-to create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.About JobRole: Applied Reinforcement Learning EngineerLocation: Palo Alto, CA or Seattle, WA (Hybrid/Remote)About the TeamCentific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. Our mission is to transform data, signals, and human insight into next-generation intelligent systems that redefine enterprise intelligence.We're building a governed RL environment platform that enables enterprises to safely iterate and improve AI agent workflows through simulation-based learning, bridging human-labeled signal creation with automated RL training for high-stakes operations.Role OverviewAs an Applied RL Engineer, you will design and build RL environments that simulate complex enterprise workflows and train intelligent agents within them. You'll work at the intersection of RL research and production systems, translating customer requirements into bespoke simulation environments and post-training pipelines that deliver measurable improvements to AI agent performance.This role requires deep expertise in both classical RL methodologies and modern LLM-based agent architectures. You'll shape our product direction and help make RL accessible to enterprise customers who need safe, compliant ways to improve their AI systems.Core RL CompetenciesFoundational RLMDPs & value methods: State/action spaces, Q-learning, DQN, Double DQN, Dueling DQNPolicy gradient methods: REINFORCE, Actor-Critic, A2C/A3C, variance reductionAdvanced optimization: PPO, TRPO, SAC, trust regions, entropy regularizationTD learning: TD(0), TD(λ), eligibility traces, bootstrapping methodsLLM Alignment & Post-TrainingRLHF pipelines: Reward model training, preference learning, human feedback integrationDirect optimization: DPO, IPO, KTO, offline preference optimizationGroup-based methods: GRPO, RLOO, sample-efficient policy improvementReward modeling: Bradley-Terry models, reward hacking mitigation, KL constraintsEnvironment DesignGymnasium/OpenAI Gym: Custom environments, observation/action spaces, wrapper patternsReward engineering: Sparse vs. dense rewards, potential-based shaping, intrinsic motivationVerifier design: Programmatic reward functions, outcome verification, ground-truth evaluationSimulation: Sim-to-real transfer, domain randomization, multi-agent dynamicsAdvanced TechniquesOffline RL: CQL, BCQ, IQL for learning from fixed datasets without environment interactionModel-based RL: World models, Dreamer, MuZero, learned dynamicsHierarchical RL: Options framework, goal-conditioned policies, temporal abstractionImitation & exploration: Behavioral cloning, GAIL, curiosity-driven exploration, UCBKey ResponsibilitiesDesign and build custom RL environments (digital twins) simulating enterprise workflows: document processing, compliance, onboarding, support automationPost-train LLM-based agents on domain-specific tasks using PPO, GRPO, DPO, and RLHFBuild end-to-end pipelines converting human-labeled traces into RL training dataArchitect multi-step reasoning agents with tool-calling and closed learning loopsDesign reward functions, verifiers, and validation frameworks for pre-deployment testingTranslate cutting-edge RL research into production systems; contribute to publicationsRequired QualificationsDeep RL expertise: 3+ years hands-on experience with environment design, reward engineering, policy optimizationLLM post-training: Experience fine-tuning LLMs using RLHF, DPO, PPO, or similarProduction skills: Software engineering beyond research with scalable pipelines and training infrastructureAgentic AI: Experience with LLM-based agents, tool use, multi-step reasoningTechnical stack: Strong Python; Gymnasium, RLlib, Stable Baselines; PyTorch/JAX/TensorFlowEducation: MS/PhD in CS, ML, or related field (or equivalent experience)Preferred QualificationsPublications at NeurIPS, ICML, ICLR, ACL, or similar venuesEnterprise workflow experience in healthcare, finance, logistics, or complianceOpen-source contributions to CleanRL, TRL, veRL, or agent frameworksExperience with world models, synthetic data generation, and simulationDistributed training and large-scale RL experimentationWhy Join CentificLead the frontier: Shape a new discipline at the intersection of RL, simulation, and enterprise AIShip your science: See your research power real systems across healthcare, finance, and safetyCollaborate with leaders: Work alongside NVIDIA, Microsoft, and the global AI communityBuild what matters: Create governed, compliant AI systems enterprises can trustSalary: $150K - $160K AnnuallyCentific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.
Showing 100 of 15,387 matching similar jobs
- NLP Systems Scientist for Large-Scale AI & LLM Training
- AI Research Scientist - Multimodal VLM/MLLM (US/Canada)
- Senior Software Engineer - Applied Machine Learning, Engine San Jose Regular
- Senior Applied Scientist, Generative AI Innovation Center
- Senior Applied Scientist, Selling Partner Growth
- Generative AI Research Engineer, Multimodal, Agent Modeling - SIML
- Staff Research Scientist/Engineer, ML Recommendation Systems, Applied Machine Learning Team
- Research Engineer / Scientist - Storage for LLM
- Senior Applied Audio ML Engineer
- Senior Applied Machine Learning Engineer, Typographic Intelligence
- Applied GenAI Engineer
- Sr. Applied Scientist, SPB Advertiser Guidance
- Sr. Applied Scientist, SPB Advertiser Guidance
- Senior Applied Scientist, Generative AI Innovation Center
- Applied Scientist, Prime Video - Playback Intelligence
- Senior Applied Scientist, Last Mile Delivery
- Senior Applied Scientist, AWS Marketplace Discovery
- SoC Modeling & Simulation Sr. Manager, Annapurna Labs Machine Learning Accelerators, AWS
- Applied Scientist II, AWS Agentic AI
- Pre-Silicon SoC Modeling Engineer, Annapurna Labs Machine Learning Accelerators, AWS
- AI/ML Engineer – Defense Analytics
- AI/ML Engineer – Defense Analytics
- Senior Applied Scientist, UI Control Models
- GPU Research Engineer- AI (San Diego / Boxborough )
- AI/ML Engineer – Defense Analytics
- AI/ML Engineer – Defense Analytics
- AI/ML Engineer – Defense Analytics
- AI/ML Engineer – Defense Analytics
- Senior Data Scientist, Dynamic Pricing
- Staff Data Scientist, Money & Lending
- Machine Learning Scientist - Quant AI - Senior Associate - Machine Learning Center of Excellence
- Senior Data Scientist – Inference, Global Markets
- Staff Data Scientist, Product: Strategic Impact & AI
- Senior Data Scientist - AI-Driven Growth & Recommenders
- San Francisco, USA Serko Senior Data Scientist
- Battery Algorithm Engineer
- Deep Agentic Reasoning Engineer
- Staff Data Scientist, Money & Lending
- AI Data Scientist
- Machine Learning Operations Contractor