Founding Cloud Inference Engineer (Low-Latency AI Serving)
ARCHIVED
We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.
A pioneering AI technology firm in San Francisco is seeking a founding member to optimize and serve models on Luminal Cloud. The role involves deploying models with advanced optimization techniques, conducting performance reviews, and enhancing scheduling processes. Ideal candidates are experienced in CUDA and GPU optimization, with hands-on knowledge of vLLM, SGLang, or TensorRT-LLM. A degree is not required, reflecting a modern approach to tech recruitment.
J-18808-Ljbffr