JOBSEARCHER

Research Engineer - Data Infrastructure

SeerPalo Alto, CAMay 28th, 2026
Senior / Staff Data Infrastructure Machine Learning EngineerWe are building advanced intelligent systems designed to operate in complex real-world environments. Our team develops the full stack — from high-performance hardware and distributed systems infrastructure to large-scale machine learning platforms and multimodal foundation models.Backed by significant funding and operating at the intersection of AI, infrastructure, and large-scale systems engineering, we are investing heavily in research, infrastructure, and production-scale deployment to build next-generation intelligent systems.We are hiring Senior and Staff-level Data Infrastructure Machine Learning Engineers to scale the systems powering our ML training data platform — from ingestion and storage to indexing, retrieval, observability, and throughput optimization across massive multimodal datasets.What You’ll DoBuild and Scale High-Throughput Data InfrastructureArchitect, build, and operate distributed data infrastructure capable of processing and managing billions of video and multimodal data samplesDesign systems with strong guarantees around reliability, latency, scalability, and cost efficiencyOptimize cloud object storage, metadata systems, databases, and large-scale distributed storage architecturesDevelop Large-Scale Indexing and Retrieval SystemsBuild efficient indexing and retrieval systems to support rapid dataset querying, filtering, and iterationImprove data access patterns and retrieval performance for research and production ML workflowsDesign scalable metadata and search infrastructure for multimodal datasetsImprove Observability and ReliabilityDevelop monitoring, alerting, failure recovery, and performance optimization frameworks for large-scale data pipelinesBuild tooling to identify bottlenecks and improve operational visibility across distributed systemsOptimize workload balancing and throughput across distributed compute and storage infrastructureManage Data Lifecycle and ReproducibilityBuild systems for artifact management, dataset versioning, lineage tracking, and reproducibility across training workflowsEnsure traceability and consistency across evolving datasets and training runsDevelop lightweight internal tooling enabling engineers and researchers to explore and analyze data at scaleSupport ML and Vision-Language WorkloadsIntegrate and scale vision-language model (VLM) inference within distributed data pipelinesSupport automated enrichment, filtering, metadata generation, and preprocessing workflowsCollaborate closely with ML systems and research teams to improve data quality and training velocityWhat We’re Looking For5+ years of experience in data infrastructure, distributed systems, ML infrastructure, or related fieldsStrong experience building and operating large-scale distributed data pipelinesDeep understanding of:Distributed systems architectureDatabases and metadata systemsIndexing and retrieval strategiesCloud storage architecturesExperience optimizing throughput, workload balancing, and cost-performance tradeoffs in cloud environmentsHands-on experience with distributed processing frameworks such as Ray or SparkStrong observability, monitoring, and production reliability experienceStrong software engineering fundamentals with the ability to own systems end-to-endLevel ExpectationsSenior engineers are expected to execute complex systems work with strong technical depth and increasing ownershipStaff-level engineers are expected to define architectural direction, drive technical strategy, and independently lead major infrastructure initiativesPreferred ExperienceExperience managing large multimodal datasetsFamiliarity with ML training workflows and data lifecycle managementExperience running large-scale ML inference workloads in distributed or cloud environmentsFamiliarity with vision-language models (VLMs)Experience working with real-world sensor data such as video, telemetry, or time-series streamsFamiliarity with data warehouse technologies such as Snowflake, BigQuery, or RedshiftExperience with data versioning and lineage systems such as DVC, Delta Lake, or similar toolingWhy This Role MattersBuild the foundational data infrastructure that directly impacts model quality and system performanceCollaborate closely with ML systems and research teams on problems with immediate and measurable impactOperate with high ownership in a small, highly technical environmentHelp scale intelligent systems operating in real-world environmentsAbout the CompanyWe are a research-driven AI company focused on building scalable intelligent systems capable of robust operation in dynamic environments. By combining advances in machine learning, distributed systems, and infrastructure engineering, we aim to push the frontier of large-scale AI systems.We are committed to building an inclusive and diverse workplace and encourage applicants from all backgrounds to apply.