JOBSEARCHER

Machine Learning Engineer (Inference)

Machine Learning Engineer (Inference)San Francisco, On-Site$200,000-$300,000 + equityWhy this roleEarly-stage infra company building a next-gen AI cloud (neocloud) — rethinking how models run across heterogeneous hardware.You’ll own the layer that actually executes models in production.🧠 What you’ll doBuild end-to-end inference systems (request → runtime → response)Optimise for latency, throughput, and concurrency under real loadDesign batching, scheduling, and queuing systemsManage KV cache + memory at scaleDebug performance across model → runtime → hardware⚙️ The fun technical bitsDeep dives into LLM inference (prefill, decode, attention)Solving tail latency + throughput trade-offsWorking across systems, ML, and hardware layersOptimising across GPUs + next-gen acceleratorsHands-on with vLLM, TensorRT-LLM, or custom runtimes🎯 What they wantExperience with ML inference / model serving systemsStrong systems or backend engineering fundamentalsComfortable with performance, memory, and scaling challengesPython + C++