JOBSEARCHER

Staff ML Ops Engineer - Recommendations (Americas)

ShopifyNorth Bend, ORApril 12th, 2026
About The RoleShopify is the commerce platform that powers millions of merchants worldwide. Behind the product experience are ML systems that drive recommendations, search, and personalization at massive scale.We build and maintain the operational backbone behind these systems: deployment pipelines, evaluation frameworks, data preprocessing, and the monitoring that keeps models fresh and reliable in production. Our models serve hundreds of millions of buyers, and the pipelines we build directly impact how quickly and safely we can improve merchant outcomes.The RoleYou will own the operational lifecycle of our ML systems: deployment pipelines, evaluation frameworks, data pipelines, and the monitoring and reliability layer that keeps everything running in production. You'll ensure models go from training to production safely, that we can evaluate changes rigorously, and that the data feeding our models is fresh and correct.This role is the connective tissue between research and production. You'll build the systems that let engineers ship model improvements with confidence and speed, while maintaining the reliability standards required to serve hundreds of millions of buyers - including during peak events like Black Friday/Cyber Monday.This role carries real technical authority. You'll set the standards for how models get deployed and evaluated, mentor engineers on operational best practices, and drive alignment on reliability and pipeline strategy across the team. You'll influence technical direction beyond your immediate team and raise the engineering bar through hiring and technical reviews.What You'll DoDeployment & RolloutOwn the model deployment pipeline end to end: export, validation, canary rollout, rollback, and A/B integrationBuild and maintain CI/CD for ML: automated testing, model validation gates, and progressive deliveryEnsure safe, repeatable deployments with clear rollback paths and minimal manual interventionEvaluation & ExperimentationBuild automated offline evaluation pipelines against production baselinesExtend our experimentation framework so ML Engineers can launch and evaluate model changes with minimal frictionMaintain evaluation datasets and ensure data freshness and correctnessIntegrate offline metrics with online A/B testing to close the feedback loopData PipelinesOwn data preprocessing for training: interaction histories, feature stores, and embedding tablesManage workflow orchestration (Airflow or equivalent) for scheduled retraining and data refresh. You trigger and monitor training runs; the underlying GPU compute layer is owned by the infrastructure side of the team.Ensure data quality, lineage tracking, and pipeline idempotencyOwn data correctness and freshness; partner with infrastructure engineers on data loading throughput and efficiencyMonitoring & ReliabilityBuild monitoring and alerting across training jobs, serving endpoints, and data pipelinesDefine and maintain SLOs for model freshness, serving latency, and training throughputParticipate in incident response and drive post-mortems for ML system failuresIdentify and eliminate toil through automationTechnical LeadershipDrive cross-team technical strategy for ML operations - identify systemic reliability risks and pipeline bottlenecks before they become incidentsMentor and up-level engineers on the team through pairing, design reviews, and setting operational standardsContribute to hiring: screen candidates, conduct technical interviews, and calibrate the engineering barWrite technical proposals and RFCs that shape operational direction across the organizationRequiredWhat We're Looking For7+ years in software engineering, with 5+ years focused on MLOps, data engineering, or production ML systemsStrong experience with ML deployment pipelines: model export, validation, canary releases, and rollback strategiesExperience with workflow orchestration for ML (Airflow, Dagster, Prefect, or similar)Solid Python fundamentals; comfortable working with PyTorch model artifacts and training configurationsProduction monitoring experience: you've built or operated alerting, dashboards, and SLO frameworks for ML systemsExperience with data pipelines at scale: batch processing, feature engineering, and data quality validationWorking proficiency with Kubernetes: able to debug pod failures, understand resource scheduling, and navigate GPU workloadsDemonstrated technical leadership: you've driven operational strategy, written technical proposals, and influenced engineering direction beyond your immediate teamTrack record of mentoring engineers and raising the reliability bar on a teamPreferredExperience with large-scale data warehouses (BigQuery or equivalent) for offline evaluation and metricsHands-on with experiment tracking and A/B testing frameworksExperience operating recommendation or retrieval systems at scaleFamiliarity with model compression workflows in production (quantization, pruning, distillation)Experience with cloud-native ML orchestration (SkyPilot, Ray, or similar)How We WorkYou'll pair directly with ML Engineers. Understanding their models well enough to build the right operational workflows is part of the job.We prefer automation over runbooks. If a process can be scripted, it should be.On-call is shared. When you're on rotation, your scope is pipeline failures, data freshness alerts, deployment rollbacks, and evaluation integrity - you own it end to end.You'll dig into Airflow DAG failures, data drift alerts, and deployment validation issues. This is a deeply operational role with high production stakes.Research and production are the same codebase. You'll see your operational decisions reflected in real model quality and real merchant outcomes.Shopify operates on high trust and low process. You'll have real ownership and the autonomy to make decisions, not just execute tickets.What Success Looks LikeIn 3 months: You've onboarded to deployment and evaluation pipelines, shipped at least one meaningful improvement to deployment safety or developer experience, and can independently debug issues across the operational stack.In 6 months: You own a major subsystem (deployment pipeline, evaluation framework, or data pipelines). Researchers are shipping model changes faster or more safely because of improvements you've made.In 12 months: You've shaped the operational roadmap for ML systems and influenced engineering direction beyond the team. Deployments are faster and safer, evaluation is more rigorous, and the team trusts the pipelines you've built. Other engineers across the organization come to you for guidance on ML operational best practices. You've made the team stronger through hiring and mentorship.About ShopifyOpportunity is not evenly distributed. Shopify puts independence within reach for anyone with a dream to start a business. We propel entrepreneurs and enterprises to scale the heights of their potential. Since 2006, we’ve grown to over 8,300 employees and generated over $1 trillion in sales for millions of merchants in 175 countries.This is life-defining work that directly impacts people’s lives as much as it transforms your own. This is putting the power of the few in the hands of the many, is a future with more voices rather than fewer, and is creating more choices instead of an elite option.About YouMoving at our pace brings a lot of change, complexity, and ambiguity—and a little bit of chaos. Shopifolk thrive on that and are comfortable being uncomfortable. That means Shopify is not the right place for everyone.Before you apply, consider if you can:Care deeply about what you do and about making commerce better for everyoneExcel by seeking professional and personal hypergrowthKeep up with an unrelenting pace (the week, not the quarter)Be resilient and resourceful in face of ambiguity and thrive on (rather than endure) changeBring critical thought and opinionPut AI agents and tools to work on the tasks they're built for, and focus on the work only humans can doEmbrace differences and disagreement to get shit done and move forwardWork digital-first for your daily workWe may use AI-enabled tools to screen, select, and assess applications. All AI outputs are reviewed and validated by our recruitment team.