JOBSEARCHER

Software Engineer, ML Inference

Software Engineer, ML InferenceSan Francisco (On-Site)$250,000–$320,000 base + equityWhy this roleEarly-stage infrastructure company building a next-generation AI cloud — rethinking how frontier models run across heterogeneous compute environments.This team is focused on the hardest part of the stack: making large-scale model inference fast, reliable, and production-ready.You’ll own the systems that actually execute models in production — working across runtime, serving infrastructure, memory management, and hardware optimisation.What you’ll doBuild and scale end-to-end inference systems from request → runtime → responseOptimise latency, throughput, concurrency, and reliability under real production workloadsDesign batching, scheduling, and queuing systems for high-performance servingImprove KV cache management and memory efficiency at scaleDebug performance bottlenecks across model, runtime, and hardware layersWork closely with systems, infrastructure, and ML teams to push inference performance forwardWhat makes this interestingDeep work on LLM inference internals including prefill, decode, and attention optimisationSolving real-world trade-offs between tail latency and throughputOptimising workloads across GPUs and next-generation acceleratorsHands-on work with vLLM, TensorRT-LLM, and custom inference runtimesOpportunity to shape core infrastructure at an early-stage companyWhat they’re looking forExperience building ML inference or model serving systemsStrong systems engineering or backend infrastructure fundamentalsExperience working on performance, scaling, memory, or distributed systems challengesStrong Python and/or C++ skillsFamiliarity with modern inference frameworks and runtimes is a plusAPPLY NOW!