JOBSEARCHER

Research Engineer

SeerPalo Alto, CAMay 28th, 2026
Data Infrastructure Machine Learning EngineerWe are building advanced intelligent robotic systems designed to operate in complex real-world environments. Our team develops the full stack — from high-performance hardware and robotic systems to large-scale machine learning infrastructure and advanced foundation models that power autonomous behavior.Backed by substantial funding and operating at the intersection of robotics, AI, and distributed systems, we are investing aggressively in research, infrastructure, hardware development, and large-scale deployment to bring general-purpose robotics into production.We are seeking Data Infrastructure Machine Learning Engineers across senior to staff levels to build and scale the systems powering large-scale training data pipelines. This role focuses on infrastructure for ingestion, storage, indexing, retrieval, observability, and throughput optimization for massive multimodal datasets.What You’ll DoBuild and Scale Data InfrastructureArchitect and operate high-throughput data infrastructure capable of processing and managing billions of video and multimodal data samplesDesign scalable systems with strong guarantees around reliability, latency, and cost efficiencyOptimize distributed storage systems, metadata services, and cloud object storage for large-scale datasetsDevelop Large-Scale Retrieval and Indexing SystemsBuild efficient indexing and retrieval systems for rapid dataset querying, filtering, and iterationSupport research and production workflows requiring fast access to large multimodal datasetsImprove data access patterns and storage performance across distributed systemsImprove Observability and ReliabilityDevelop monitoring, alerting, and failure recovery systems for large-scale data pipelinesBuild performance analysis and observability tooling to improve throughput and system reliabilityIdentify bottlenecks and optimize workload balancing across distributed compute and storage infrastructureManage Data Lifecycle and ReproducibilityBuild systems for dataset versioning, lineage tracking, and reproducibility across training runsManage data artifacts and metadata consistency throughout the ML lifecycleDevelop internal tooling and interfaces that enable engineers and researchers to explore and analyze large datasets efficientlySupport ML and VLM IntegrationIntegrate and scale vision-language model (VLM) inference workloads within distributed data pipelinesSupport data enrichment, filtering, metadata generation, and automated labeling workflowsCollaborate closely with ML systems and research teams to improve training data quality and iteration speedWhat We’re Looking For5+ years of experience in data infrastructure, distributed systems, ML infrastructure, or related areasStrong experience building and operating large-scale data pipelines and distributed systemsDeep understanding of:Distributed systems architectureDatabases and metadata systemsIndexing and retrieval strategiesCloud storage architecturesExperience optimizing throughput, workload balancing, and cost-performance tradeoffs in cloud environmentsHands-on experience with distributed processing frameworks such as Ray or SparkStrong observability, monitoring, and production reliability experienceStrong software engineering fundamentals with the ability to own systems end-to-endLevel ExpectationsSenior engineers are expected to execute complex systems work with strong technical fundamentals and growing ownershipStaff-level engineers are expected to define architectural direction, drive technical strategy, and independently own major infrastructure decisionsPreferred ExperienceExperience managing large multimodal datasetsFamiliarity with ML training workflows and data lifecycle managementExperience running large-scale inference workloads in distributed or cloud environmentsFamiliarity with vision-language models (VLMs)Experience with robotics data formats or real-world sensor data such as video, telemetry, or teleoperation logsFamiliarity with data warehouse technologies such as Snowflake, BigQuery, or RedshiftExperience with data versioning and lineage tooling such as DVC, Delta Lake, or similar systemsWhy This Role MattersBuild the foundational data infrastructure that directly impacts model quality and research velocityWork closely with ML systems and research teams on problems with immediate and measurable impactOperate with high ownership in a small, highly technical environmentHelp power intelligent robotic systems operating in real-world environments at scaleAbout the CompanyWe are a research-driven AI and robotics company focused on building scalable intelligent systems capable of robust real-world operation. By combining advances in machine learning, robotics, distributed systems, and infrastructure engineering, we aim to push the frontier of embodied intelligence.We are committed to building an inclusive and diverse workplace and encourage applicants from all backgrounds to apply.