Senior MLOps / ML Infrastructure Engineer (San Francisco)
Senior MLOps / ML Infrastructure EngineerContract: 6 months on W2Location: RemoteAbout the Role:We are seeking a Senior MLOps / ML Infrastructure Engineer to join our core platform team enabling research and engineering via shared ML systems. This role focuses on building scalable, efficient, and standardized ML workflows that accelerate experimentation and deployment across the organization.Responsibilities:Design, develop, and maintain scalable ML workflows and pipelinesBuild and improve ML infrastructure for training and serving, including GKE-based systemsAutomate ML workflows to improve efficiency and reduce operational overheadDevelop robust data sampling and feature generation platformsStandardize ML training, deployment, and knowledge distillation pipelinesCollaborate closely with researchers and engineers to support large-scale ML experimentationDrive foundational ML platform tooling and adoptionKey Projects / Initiatives:Scalable ML workflows and pipelines for large-scale ML systemsAutomation of end-to-end ML workflowsGKE-based training and serving infrastructureKnowledge distillation and foundational training tool developmentTeam Overview:The platform team empowers researchers and engineers with shared ML systems and platformsResponsible for scalable ML workflows, robust infra, automation, and standardized deployment pipelinesQualifications:Must-have (Technical & Soft Skills):5-10+ years of experience in large-scale ML systems, MLOps, or ML infrastructureStrong expertise in ML workflows, distributed systems, and pipeline automationExperience with GKE and scalable ML training/serving platformsCollaborative, ownership-driven, and pragmatic approach to problem-solvingStrong communication and teamwork skills to work with research and engineering teamsPreferred Background / Industries:Experience at large-scale ML/AI companies (Google, Meta, Amazon, Microsoft)Hands-on MLE or ML infra experience, not purely theoretical or pure DevOpsDesired Attributes / Work Style:Reliability, consistency, and scalability mindsetCost-aware and methodical in approachFast-moving and proactive with strong collaboration skillsSuccess Metrics / KPIs:Faster time-to-market for ML experimentsImproved training efficiency and infra uptimePipeline reliability and cost optimizationStable deployments and high platform adoptionReduced onboarding time for ML workflows