Member of Technical Staff - ML Infrastructure & Performance

Embedding VcSan Mateo, CAMay 3rd, 2026

Computer Systems Engineers/ArchitectsComputer Systems Design and Related Services

Introducing Moonlake, AI for creating real-time interactive contentMission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions.Scope of Work: GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing. Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning. Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving. Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback. Tech signals:Previous experience at Infra-heavy startups such as Databricks, RobloxWe are committed to being an on-site, in-person team currently based in San Mateo

Member of Technical Staff - ML Infrastructure & Performance

matching similar jobs near San Mateo, CA