GPU Systems / AI Infra Engineer
Senior GPU Systems / AI Infrastructure Engineer (NYC)Location: New York City (Hybrid / On-site preferred)Comp: Competitive + equity (Series A-C / high-growth AI infra)About the RoleWe're hiring a senior-level engineer to build and optimise next-generation AI infrastructure powering large-scale model training and inference. This role sits at the intersection of GPU systems, kernel optimisation, distributed compute, and high-performance AI workloads.You'll work directly on the performance layer of modern AI stacks-where milliseconds matter, GPUs are saturated, and inefficiencies translate directly into cost and latency at scale.This is a deeply technical role for engineers who are comfortable working close to the metal and care about squeezing every ounce of performance out of modern accelerators (NVIDIA, AMD, and emerging architectures).What You'll Work OnDesign and optimise GPU kernels (CUDA / Triton / HIP) for large-scale AI workloadsBuild and tune high-performance inference and training pipelines for LLMs and multimodal modelsWork on distributed systems for AI training (multi-node, multi-GPU clusters)Improve memory bandwidth utilisation, kernel fusion, and compute efficiencyContribute to or extend frameworks like PyTorch, JAX, or custom runtimesBuild tooling for profiling, benchmarking, and performance regression detectionCollaborate closely with ML researchers and infra engineers to remove system bottlenecksWhat We're Looking For (Core Profile / MPC Fit)You're likely a strong match if you have:5-10+ years in systems engineering, HPC, GPU computing, or AI infrastructureDeep experience with CUDA programming and GPU kernel optimisationStrong understanding of parallel computing, memory hierarchies, and compute bottlenecksExperience with distributed systems (Ray, MPI, NCCL, custom cluster orchestration, etc.)Background in high-performance C++ / Rust / Python systemsExperience working on training or inference stacks for large-scale ML modelsStrong intuition for performance profiling (Nsight, perf, flamegraphs, etc.)Nice to HaveExperience with Triton, TVM, or MLIR-based compiler stacksExposure to kernel fusion, graph compilation, or runtime optimisationExperience at AI infra startups, hyperscalers, or HPC environmentsFamiliarity with quantisation, KV caching, or inference acceleration techniquesContributions to open-source ML systems or GPU librariesBackground in CUDA graph execution, stream scheduling, or warp-level optimisationWhy This RoleWork on the critical performance layer of AI systems (not application-level ML)Direct impact on cost, latency, and scalability of frontier AI modelsHigh autonomy-own entire subsystems (kernel → runtime → distributed execution)NYC-based team building at the forefront of AI infrastructure and compute optimisationOpportunity to shape systems used at massive scale in production ML workloads Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.