JOBSEARCHER

GPU Technical Support

ARCHIVED

We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.

Title: AI Lab Technical Support SpecialistLocation: Milpitas, CA 95035Duration: 03+ MonthsPosition Summary100% on siteMonday thru Friday 8 AM to 5 PMRole OverviewWe re looking for a hands-on engineer/technician to assist with the setup, maintenance, and operation of our high-performance computing cluster.This role is ideal for someone with practical experience in Linux systems in the data center who enjoys working in a fast-paced technical environment.Key ResponsibilitiesRacking, Stacking, Cabling and maintenance the AI data center and lab.Perform routine maintenance and troubleshooting on Linux servers, storage and networking systems.Use tools to monitor and troubleshoot hardware issues.Work closely with engineers and developers to ensure smooth operation of the AI infrastructure.This role is a hands-on, hardware focused technical support position supporting GPU/compute clusters in an AI lab/R&D environment. The emphasis is on hardware troubleshooting, Linux-based system support, and deep understanding of compute architecture, rather than software development.Key ResponsibilitiesTroubleshoot GPU/CPU servers, compute clusters, and networking (InfiniBand)Diagnose hardware issues (cabling, components, GPUs, servers)Rack/stack initially limited (systems already built), but may increase if extendedReplace/install server components within racksUse Linux command line extensively for diagnostics and system validationManage lab space and hardware inventory (re procurement access provided)Must Have Skills (Non Negotiable)Strong hardware troubleshooting experience (servers, GPUs, compute systems)Solid understanding of computer/compute architectureStrong Linux skills for system bring up and troubleshootingExperience with GPUs and high performance compute environmentsAbility to independently diagnose and resolve hardware/system issuesPreferred / Nice To HavePrior data center or HPC/compute cluster experience (plus, not mandatory)Scripting experience (Bash, Python) expected if candidate has done similar rolesFamiliarity with GPU technologies (cutting edge R&D GPUs; Tesla, etc.)Candidates who ve built systems themselves (gaming PCs, lab servers, small data centers)Experience & EducationMinimum: 3 4 years of relevant experience (not pure sysadmin only)Bachelor s degree preferred, but experience matters more than degreeNo travel requiredRequired Skills/ExperienceExperience with assembly of mechanical or electrical systems, or performing component-level repairs and troubleshooting on technical equipment.Ability to lift/move 50lb (23kg) of equipment and ability to exert yourself physically over extended periods of time, including frequent bending, kneeling, climbing, pushing/pulling and lifting.Experience working within a data center or network operation center environment.Comfortable working in a Linux environment & ability to diagnose and troubleshoot issues in operating systems, computer/server hardware or networking stack.Able to write and understand simple Bash or Python scripts.Exposure to Git, Jenkins, or similar tools is a plus.