JOBSEARCHER

Member of Technical Staff - Efficient ML

Embedding VcMillbrae, CAApril 14th, 2026
Introducing Moonlake, AI for creating world simulations.Scope of WorkTraining efficiencyDataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performanceNsight profiling, Triton/CUDA kernels, fused ops. Flash-attention–style speedups, sequence packing, KV-cache tricks. Inference optimizationLow-latency serving, continuous batching, speculative decoding. Quantization (GPTQ/AWQ), distillation, pruning. Infra + reliabilitySLURM/K8s multi-node jobs, checkpoint hygiene. Determinism, env pinning, GPU failure handling. We are committed to being an on-site, in-person team currently based in San Mateo