<Back to Search
Research Engineer - Reinforcement Learning (RL) Systems & Infrastructure (Seed Infra)
San Jose, CAApril 5th, 2026
About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Responsibilities - Design and build end-to-end reinforcement learning (RL) systems for large-scale models, covering rollout, training, evaluation, and deployment pipelines. - Develop scalable and fault-tolerant RL infrastructure that operates efficiently under dynamic workloads and heterogeneous compute environments. - Optimize distributed training performance across GPU clusters, improving throughput, resource utilization, and system stability. - Collaborate with cross-team researchers on targeted system-algorithm co-design to translate research ideas into robust, production-grade implementations. - Build tooling, monitoring, and debugging frameworks to ensure reliability and observability of large-scale RL training systems.Minimum Qualifications: - Strong background in distributed systems, large-scale ML systems, or deep learning infrastructure - Experience building or optimizing large-scale training systems (e.g., RL, LLM, multimodal models) - Solid engineering skills in Python/C++ and familiarity with modern ML stacks (PyTorch, distributed training frameworks, etc.) - Experience with GPU optimization, parallelism strategies, and system-level performance tuning - Understanding of reinforcement learning workflows (rollout, policy update, evaluation loops) Preferred Qualifications: - Experience with large-scale agent systems - Familiarity with system design under heterogeneous or dynamic workloads - Exposure to RL + LLM training or post-training pipelines
371 matching similar jobs near San Jose, CA
- Senior AI Research Scientist - Enterprise BI & LLMs Hybrid
- Senior Applied Machine Learning Engineer
- Staff Machine Learning Infrastructure Engineer
- Principal AI Security Technologist & AI Transformation
- Machine Learning Engineer
- Machine Learning Research Engineer
- Principal AI/ML Engineer, Security AI
- Senior Staff Rust Engineer - AI Security & LLM Infra
- Senior Applied AI Research Lead
- Applied ML Researcher: LLMs, Agents & Multi-Modal
- AI Research & Agentic Engineer Hybrid
- Strategic AI Systems Lead: Scalable ML & Cloud Production
- GPU-Optimized LLM Inference Engineer
- Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisati
- Machine Learning Engineer, AI Coding Tools San Jose Regular
- Cisco is Seeking Machine Learning Engineer â AI Research
- Senior ML Inference Engineer - Distributed Systems & Equity
- Machine Learning Engineer
- 2026 Intern - Research Scientist/Engineer
- Senior Applied Scientist - AI Guardrails Platform
- Multimodal AI Algorithm Expert-EMG / Interaction Perception, PICO
- Applied Scientist, Vector AI
- Autonomous Driving ML Engineer
- Autonomous Driving Perception ML Engineer
- ML Engineer: LLMs, VLMs & Reasoning AI | Equity
- Research Scientist - Foundation Model, Speech Understanding
- Sr. Engineer, Algorithm
- Staff Advanced Concepts Optimization Engineer
- Lead AI Engineer
- Machine Learning Engineer
- Director, Cisco Silicon - Product Manager: AI Switching
- Senior Software Engineer
- Head of Machine Learning
- Staff Backend Engineer (Typescript)
- Staff Data Center Solution Manager
- Senior AI Deployment Engineer - Customer-Facing & OnsiteSan Jose, CAApril 1st, 2026
- Principal Architect - AI Foundations & Orchestration
- Senior Engineer, AI/ML Visualization
- Staff Software Engineer, AI
- Principal Product Manager - DCIM Software (27484)