Lead DevOps/MLOps Engineer
Occupations:
Computer Systems Engineers/ArchitectsSoftware DevelopersNetwork and Computer Systems AdministratorsComputer and Information Systems ManagersData ScientistsIndustries:
Nonferrous Metal (except Aluminum) Production and ProcessingSoftware PublishersComputer Systems Design and Related ServicesOffice Administrative ServicesIndividual and Family ServicesWe're looking for a strong DevOps engineer who can help scale and operationalize our infrastructure as the platform grows. This is not a pure platform-architecture role — the focus is CI/CD, infrastructure automation, deployment reliability, observability, and GPU-oriented workload scaling.What You'll OwnImprove CI/CD pipelines, deployment workflows, and release reliabilityStandardize infrastructure and deployment patterns across environmentsImprove observability through logging, metrics, tracing, dashboards, and rollout monitoringPartner closely with backend engineering on:deployment strategiesinfrastructure automationenvironment consistencymigration workflowspossible Kubernetes migration effortsSupport ML-oriented infrastructure as a secondary responsibility:SageMaker workloadsRay clustersGPU scaling patternsdistributed batch executionautoscaling behaviorruntime/image managementartifact delivery/versioningThe Kind of Problems You'll Work OnDeployment safety and rollback strategiesInfrastructure consistency across environmentsRelease automation and environment promotion flowsAutoscaling and runtime stabilityGPU workload orchestration and scaling efficiencyOperational tooling that reduces friction for engineering teamsStackAWSTerraformDockerKubernetesCI/CD systemsSageMakerRayGPU compute infrastructureYou'll Probably Do Well Here IfYou've operated production infrastructure at meaningful scaleYou're strong in practical DevOps execution and operational reliabilityYou care about automation, observability, and deployment safetyYou're comfortable improving developer workflows and infrastructure toolingYou've worked with distributed systems or GPU-oriented workloads before