<Back to Search
Senior Research Engineer, Training Data Infrastructure in Foundation Models
Cupertino, CAMarch 31st, 2026
Our team is dedicated to solving the high-quality training data problem at the scale required to train advanced Foundation Models. We believe that the advanced model performance (including reasoning, coding, and agentic planning) fundamentally depends on a data-centric approach to Machine Learning. Our objective is to engineer a large-scale system that acquires, processes, and curates the data required to advance the state of the art in Artificial Intelligence. We are seeking a Senior Research Engineer who possesses a deep understanding of distributed systems and a strong intuition for Machine Learning. You will join a culture that values engineering craftsmanship, privacy, and rigorous scientific inquiry, utilizing advanced cloud technologies to build the data systems that powers our most capable models.This position operates at the convergence of Software Engineering and Machine Learning Research. Unlike traditional backend roles, this position requires you to design systems where the outcome is the statistical distribution and quality of data itself. You will work alongside Research Scientists to transform theoretical observations into concrete, scalable engineering solutions. Your core focus will be the architecture of our Data Acquisition, Processing, and Repository Management systems for Large Model training. You will lead technical efforts to enable active, quality-driven data curation, including filtering, deduping, synthetic data generation and data mixing, ensuring our models are trained on the highest-quality information available.Research Collaboration: Experience working within or closely with ML research organizations (e.g., as a Research Engineer), with an ability to translate research results into engineering implementations. Domain Knowledge: Familiarity with lifecycle of modern LLM training, end-to-end workflows, and underlying system architecture. Complex Data Types: Experience in processing complex data modalities beyond plain text, such as source code repositories, images, videos, and audios.Education: Bachelor's degree in Computer Science, Electrical Engineering, or Mathematics. Technical Expertise: 4+ years of software engineering experience with a specific focus on Data Infrastructure, Distributed Systems, or AI/ML Engineering. Language Proficiency: Expert fluency in Python, and strong competence in system languages such as C++. Cloud Architecture: Extensive experience architecting solutions on major public cloud platforms (e.g. GCP) to build scalable data systems (e.g. with Apache Beam, GCS) Performance Engineering: Deep experience profiling and optimizing high-throughput data systems. Demonstrated ability to debug distributed bottlenecks (e.g., stragglers, I/O saturation), optimize data formats and provide efficient data storage solutions.
Showing all 1,147 matching similar jobs
- Senior Solutions Architect , ISVMountain View, CAApril 1st, 2026
- Senior Product Marketing Manager, Cloud AcceleratorSanta Clara, CAMarch 31st, 2026
- AI Portfolio Marketing Manager
- Staff Product Manager, Risk and Fraud
- Senior Infrastructure Engineer - Autonomy Performance & Pipelines
- Principal Software Engineer, Developer Velocity & Infra
- Principal Software Engineer, Ads Infrastructure
- SDE-II, Healthcare AI
- Senior Software Engineer - ML Agent Platform LeadSunnyvale, CAMarch 28th, 2026
- Senior Infra Software Engineer - AI-Driven Microservices
- Lead Software Engineer
- Mid-Senior Azure Cloud-Native Platform ArchitectSanta Clara, CAMarch 27th, 2026
- Software Principal Engineer-RAID and Storage
- Senior GenAI & ML Systems Engineer
- Senior Principal AI/ML Scientist - MedTech Leader
- Software Engineer III, AI/ML, Google Research
- Senior AI/ML & Infra Engineer - NLP, Cloud & Systems
- Robotics AI Research Scientist - Manipulation & ML
- Senior ML & CV Engineer - 3D Vision & AISunnyvale, CAApril 1st, 2026
- Senior Health Tools Engineer - AI-Powered DebuggingSunnyvale, CAApril 1st, 2026
- Staff Engineer, Distributed Storage Infrastructure
- Member of Technical Staff, Data Architect
- Staff Applied AI Engineer - Build & Deploy AI Products
- Senior Generative AI Scientist (Multimodal)Sunnyvale, CAApril 1st, 2026
- Software Engineer, TT-Fabric
- Enterprise AI & Data Architecture LeaderMenlo Park, CAApril 1st, 2026
- Forward Deployed Engineer (FDE)Menlo Park, CAApril 1st, 2026
- Senior Staff Architect: AI/ML Silicon & TPU Innovation
- Senior Solutions Architect, Autonomous Vehicles - Data Center
- Staff AI-Driven Healthcare Product ManagerMountain View, CAMarch 31st, 2026
- Engineering Manager, AI/ML Recommendations & Rankings
- Warehouse Cherry Picker - Now Hiring
- Product Marketing ManagerSanta Clara, CAMarch 31st, 2026
- UX Research Lead, 3D Human Modeling
- Sr Technical Product Manager - Analytics and Data Privacy
- Principal Product Marketing Manager, Mid-Market
- Senior Product Marketing Manager, Industrial & Computational Engineering
- Developer Marketing Manager – Nsight Developer Tools
- Senior Product Manager (Prisma Access Agent)palo alto networksweb search portals libraries archives and other information servicesagents and managers for artists athletes entertainers and other public figuresagencies brokerages and other insurance related activitiescontinuing care retirement communities and assisted living facilities for the elderlysatellite telecommunicationsSanta Clara, CAMarch 31st, 2026
- Product Marketing Manager - AI Platform