JOBSEARCHER

Machine Learning Engineer

Location: Bay area (frequent customer interaction)Team: Inference & Reinforcement Learning PlatformAbout the RoleWe’re looking for a Machine Learning Engineer (MLE) to work directly with customers and partners to design, deploy, and validate inference and reinforcement learning (RL) proof-of-concepts on GMI’s GPU infrastructure.This is a high-impact, hybrid engineering role that sits at the intersection of platform engineering, applied ML, and customer success. You’ll be embedded with customers during early-stage deployments—turning research ideas, datasets, and business requirements into working, performant systems on real GPU clusters.If you enjoy being close to users, debugging real systems, and shipping results fast (not just writing docs), this role is for you.What You’ll DoOwn customer POCs end-to-endDeploy and optimize LLM inference, RL training, and post-training workflows on GMI clustersTranslate customer requirements into concrete system designs and experimentsForward-deploy with customersWork hands-on with research teams, startups, and enterprise customersDebug performance, stability, and correctness issues in real environmentsInference deploymentStand up and tune inference stacks (e.g. vLLM / SGLang / Ray Serve–style architectures)Optimize latency, throughput, GPU utilization, and cost efficiencyRL & post-training POCsSupport RLHF / RFT / SFT workflows using customer-provided datasetsIntegrate SDKs, training APIs, and cluster resources to shorten “idea → experiment” cyclesPerformance & reliabilityDiagnose GPU, networking, and distributed system bottlenecksRun benchmarks, profiling, and stress tests on multi-GPU / multi-node setupsFeedback loop to productFeed real-world customer learnings back into GMI’s platform, SDKs, and APIsHelp shape reference architectures, cookbooks, and best practicesWhat We’re Looking ForCore RequirementsStrong software engineering background (Python required; Go / Rust a plus)Hands-on experience with ML inference or training systemsFamiliarity with distributed systems and GPUs (multi-GPU, multi-node)Comfort working directly with customers and ambiguous requirementsAbility to debug end-to-end systems (code, infra, networking, performance)Nice to HaveExperience with:LLM inference frameworks (vLLM, SGLang, Ray Serve, Triton, etc.)RL or post-training workflows (RLHF, RFT, SFT)PyTorch, DeepSpeed, Megatron-LM, or similarKubernetes-based ML platformsGPU performance profiling and optimizationPrior experience as:Forward Deployed EngineerSolutions EngineerML Platform EngineerApplied Research EngineerWhat Makes This Role SpecialYou’re close to real users and real GPUs—not abstract roadmapsYou’ll work on cutting-edge inference and RL workloads, not toy demosYou’ll influence product direction through direct customer feedbackFast iteration, high ownership, and visible impactWho Thrives HereEngineers who like shipping over theorizingPeople who enjoy being the “last mile” problem solverBuilders who want exposure to both deep systems and applied MLThose excited by early-stage POCs that turn into real production systems