Senior Datacenter Performance Model Engineer
Job Requisition IDJR2018874Job CategoryEngineeringTime TypeFull timeNVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. We are looking for forward-thinking, hard-working, and creative people to join a fast-moving multifaceted software team! This software engineering role involves developing datacenter scale performance modeling and predictions tools for AI researchers running AI workloads in GPU clusters.What You'll Be DoingBuild performance modeling and prediction tools for AI workloads at Data-center scale Develop production tools and workflows used by multiple teams both within NVIDIA and its customers. Automate workflows including search for the most efficient configurations over millions of parametersPartner with HW and SW architects to propose new features or improve existing features with real world use casesWhat We Need To SeeBS+ in Computer Science or related (or equivalent experience) and 5+ years of software developmentStrong software skills in design, coding (C++ and Python), analytical, and debuggingGood understanding of Deep Learning frameworks like PyTorch and TensorFlow, distributed training and inference.Knowledge of GPU cluster job scheduling (Slurm or Kubernetes), storage and networkingExperience with NVIDIA GPUs, CUDA Programming, and NetworkingMotivated self-starter with strong problem-solving skills and customer-facing communication skillsPassion for continuous learning. Ability to work concurrently with multiple global groupsWays To Stand Out From The CrowdProven SW engineering experience experience in deploying SW at Dataceter scale Solid experience in large AI job performance analysis for training/inference workloadKnowledge of Linux device drivers and/or compiler implementationKnowledge of GPU and/or CPU architecture and general computer architecture principlesYour base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and benefits.Applications for this job will be accepted at least until June 1, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.