AI Infrastructure Engineer
Utilidata is a fast-growing NVIDIA-backed AI company enabling AI data centers to dynamically orchestrate power and unlock more compute capacity from existing energy infrastructure. For over a decade, we have applied AI to the electric grid — bringing real-time visibility and power-flow control to complex energy infrastructure. Our Karman platform, built on a custom NVIDIA module, brings that same capability to AI data centers, giving operators a way to better use the power already available to them.The AI Infrastructure Engineer is responsible for designing, building, and owning the end-to-end infrastructure that serves Utilidata's AI and ML models across edge deployments, cloud environments, and data center integrations. They are also responsible for designing, building, and owning the integration of power data with AI inference software. This is Utilidata's first dedicated role of this kind, and will serve as the foundational function for how the company deploys and operates AI capabilities in production. The role requires deep technical expertise in ML model serving, distributed systems, and GPU infrastructure, with a strong emphasis on reliability, performance, and scalability. This position works cross-functionally with product, engineering, and data science teams and is open to fully remote candidates, with periodic travel expected for company retreats and key on-site engagements.Responsibilities Lead the design and build of Utilidata's AI inference platform — establishing architecture patterns, deployment standards, and operational practices that will scale with the companyOwn end-to-end model serving infrastructure for Utilidata's AI infrastructure (on-prem and datacenter)Build and maintain fault-tolerant, high-performance systems for serving AI models at scale, with a focus on low latency, reliability, and cost efficiencyCollaborate closely with algorithms engineers to integrate AI inference data and configuration with power optimization algorithmsOptimize GPU utilization and inference performance across our hardware fleet, including NVIDIA accelerators central to Utilidata's edge AI platformEstablish MLOps best practices including CI/CD pipelines for model deployment, monitoring, and rollback across environmentsContribute to infrastructure roadmap decisions, including build vs. buy tradeoffs, tooling selection, and platform evolution as the team growsMinimum Qualifications 5+ years of software engineering experience with a strong focus on AI infrastructure, backend systems, or distributed systemsHands-on experience with AI model serving frameworks (e.g., vLLM, SGLang, Triton, TensorRT, TorchServe, or similar)Understanding of container orchestration and cluster management (Kubernetes, Docker)Experience deploying and operating infrastructure across both datacenter and on-prem environmentsStrong knowledge of GPU workloads and the tradeoffs that come with them — you understand how inference differs from training, and why it mattersProficiency in Python; C++, CUDA, Go, Rust a plusExcellent communication skills and comfort working cross-functionally in a lean, fast-moving environmentWillingness to travel up to 10% of timeEnhanced Qualifications (Nice to Have) Dynamo experience a plusExperience with edge AI deployments or constrained compute environmentsFamiliarity with infrastructure as code (Terraform, Helm)Experience with observability platforms (Datadog, Prometheus, Grafana)Background in energy, utilities, or industrial IoTContributions to open-source ML infrastructure projectsSalary Range: $170,000 to $210,000 base compensation depending on experience plus stock options. Salary will be commensurate with an individual's skills, training, years of experience, and in line with internal compensation bands.Location: This position can be performed remotely from anywhere in the United States.Our Commitments:Utilidata values the diversity of our team. We provide equal employment opportunities without regard to race, color, religion, creed, sex, gender, sexual orientation, gender identity or expression, national origin, age, physical disability, mental disability, medical condition, pregnancy or childbirth, sexual orientation, genetics, genetic information, marital status, or status as a covered veteran or any other basis protected by applicable federal, state and local laws.We are committed to: Creating a diverse and inclusive workplace that is welcoming, supportive, affirming and respectfulEmpowering employees to solve problems and work together to make a differenceProviding mentorship and growth opportunities as part of a collaborative teamA flexible work environment with flexible paid time offCompetitive compensation and benefits, including health, dental, vision, and employer-match 401kPowered by JazzHR