Lead MLOps / AI Platform Engineer
Dice is the leading career destination for tech experts at every stage of their careers. Our client, SATCON Inc, is seeking the following. Apply via Dice today!Job Description: Lead MLOps / AI Platform EngineerLocation: Charlotte, NCDuration: Long Term Visa Type: & Candidates Role OverviewWe are seeking a highly skilled Lead MLOps / AI Platform Engineer to design, build, and optimize our next-generation Generative AI and Large Language Model (LLM) infrastructure. This role is pivotal in bridging the gap between cutting-edge AI research and robust production deployment. You will be responsible for orchestrating high-performance GPU environments (specifically leveraging Nvidia H200s), optimizing LLM inference, and maintaining enterprise-grade infrastructure across both Cloud (Google Cloud Platform/Azure) and On-Premise environments.Key ResponsibilitiesAI Inference Optimization & ServingDeploy, scale, and manage large-scale language models using advanced inference frameworks such as vLLM, TensorRT-LLM, SGLang, and Triton Inference Server.Implement and fine-tune performance optimization strategies, including Continuous Batching and advanced Parallelism techniques.Conduct load testing, benchmarking, and profiling of LLM deployments using GuideLLM and Locust to ensure optimal latency and throughput.Cloud & Infrastructure OrchestrationArchitect and maintain scalable, secure infrastructure on Google Cloud Platform and Azure using Infrastructure as Code (Terraform).Design and execute Cloud Networking, Landing Zones, and Organization Policies/Governance.Manage secrets and secure workloads utilizing HashiCorp Vault.Develop and champion Internal Developer Portals to streamline workflows for data science and product teams.On-Premise & Kubernetes EngineeringOrchestrate ML workloads on Kubernetes, utilizing KServe, OpenShift AI / OpenShift Functions, and Helm charts/Operators.Manage compute clusters with a heavy focus on advanced GPU Orchestration (Nvidia H200 ecosystems).Demonstrate deep hands-on expertise in architecture and "know-how to unfold an LLM" into highly constrained or custom on-premise hardware setups.Observability & SREImplement end-to-end ML Observability and monitoring frameworks using Arize AI.Establish Site Reliability Engineering (SRE) best practices, maintaining strict SLOs/SLIs for model deployment pipelines and production APIs.Required Skills & QualificationsCore AI / MLOps Stack:Inference Engines: vLLM, TensorRT-LLM, Triton Inference Server, SGLangML Frameworks/Ops: KServe, OpenShift AI, Arize AI, GenAI Platforms, RAG architecturePerformance & Testing: GuideLLM, Locust, Continuous Batching, Parallelism optimizationInfrastructure & Cloud Stack:Cloud Providers: Google Cloud Platform (Google Cloud Platform), Microsoft AzureContainerization & Orchestration: Kubernetes, OpenShift, Helm/Operators, GPU OrchestrationIaC & Automation: Terraform, PythonSecurity & Networking: HashiCorp Vault, Landing Zones, Org Policy & GovernanceHardware Sanity Check:Mandatory Experience: Direct, hands-on experience working with Nvidia H200 GPUs and optimizing workloads specifically for this architecture.