JOBSEARCHER

Lead MLOps / AI Platform Engineer

Via DiceCharlotte, NCMay 24th, 2026
Dice is the leading career destination for tech experts at every stage of their careers. Our client, SATCON Inc, is seeking the following. Apply via Dice today!Job Description: Lead MLOps / AI Platform EngineerLocation: Charlotte, NCDuration: Long Term Visa Type: & Candidates Role OverviewWe are seeking a highly skilled Lead MLOps / AI Platform Engineer to design, build, and optimize our next-generation Generative AI and Large Language Model (LLM) infrastructure. This role is pivotal in bridging the gap between cutting-edge AI research and robust production deployment. You will be responsible for orchestrating high-performance GPU environments (specifically leveraging Nvidia H200s), optimizing LLM inference, and maintaining enterprise-grade infrastructure across both Cloud (Google Cloud Platform/Azure) and On-Premise environments.Key ResponsibilitiesAI Inference Optimization & ServingDeploy, scale, and manage large-scale language models using advanced inference frameworks such as vLLM, TensorRT-LLM, SGLang, and Triton Inference Server.Implement and fine-tune performance optimization strategies, including Continuous Batching and advanced Parallelism techniques.Conduct load testing, benchmarking, and profiling of LLM deployments using GuideLLM and Locust to ensure optimal latency and throughput.Cloud & Infrastructure OrchestrationArchitect and maintain scalable, secure infrastructure on Google Cloud Platform and Azure using Infrastructure as Code (Terraform).Design and execute Cloud Networking, Landing Zones, and Organization Policies/Governance.Manage secrets and secure workloads utilizing HashiCorp Vault.Develop and champion Internal Developer Portals to streamline workflows for data science and product teams.On-Premise & Kubernetes EngineeringOrchestrate ML workloads on Kubernetes, utilizing KServe, OpenShift AI / OpenShift Functions, and Helm charts/Operators.Manage compute clusters with a heavy focus on advanced GPU Orchestration (Nvidia H200 ecosystems).Demonstrate deep hands-on expertise in architecture and "know-how to unfold an LLM" into highly constrained or custom on-premise hardware setups.Observability & SREImplement end-to-end ML Observability and monitoring frameworks using Arize AI.Establish Site Reliability Engineering (SRE) best practices, maintaining strict SLOs/SLIs for model deployment pipelines and production APIs.Required Skills & QualificationsCore AI / MLOps Stack:Inference Engines: vLLM, TensorRT-LLM, Triton Inference Server, SGLangML Frameworks/Ops: KServe, OpenShift AI, Arize AI, GenAI Platforms, RAG architecturePerformance & Testing: GuideLLM, Locust, Continuous Batching, Parallelism optimizationInfrastructure & Cloud Stack:Cloud Providers: Google Cloud Platform (Google Cloud Platform), Microsoft AzureContainerization & Orchestration: Kubernetes, OpenShift, Helm/Operators, GPU OrchestrationIaC & Automation: Terraform, PythonSecurity & Networking: HashiCorp Vault, Landing Zones, Org Policy & GovernanceHardware Sanity Check:Mandatory Experience: Direct, hands-on experience working with Nvidia H200 GPUs and optimizing workloads specifically for this architecture.