JOBSEARCHER

Lead DevOps/MLOps Engineer

We're looking for a strong DevOps engineer who can help scale and operationalize our infrastructure as the platform grows. This is not a pure platform-architecture role — the focus is CI/CD, infrastructure automation, deployment reliability, observability, and GPU-oriented workload scaling.What You'll OwnImprove CI/CD pipelines, deployment workflows, and release reliabilityStandardize infrastructure and deployment patterns across environmentsImprove observability through logging, metrics, tracing, dashboards, and rollout monitoringPartner closely with backend engineering on:deployment strategiesinfrastructure automationenvironment consistencymigration workflowspossible Kubernetes migration effortsSupport ML-oriented infrastructure as a secondary responsibility:SageMaker workloadsRay clustersGPU scaling patternsdistributed batch executionautoscaling behaviorruntime/image managementartifact delivery/versioningThe Kind of Problems You'll Work OnDeployment safety and rollback strategiesInfrastructure consistency across environmentsRelease automation and environment promotion flowsAutoscaling and runtime stabilityGPU workload orchestration and scaling efficiencyOperational tooling that reduces friction for engineering teamsStackAWSTerraformDockerKubernetesCI/CD systemsSageMakerRayGPU compute infrastructureYou'll Probably Do Well Here IfYou've operated production infrastructure at meaningful scaleYou're strong in practical DevOps execution and operational reliabilityYou care about automation, observability, and deployment safetyYou're comfortable improving developer workflows and infrastructure toolingYou've worked with distributed systems or GPU-oriented workloads before

matching similar jobs near Reston, VA

VIEW MORE