Performance Kernel Engineer for High-Speed GPU Inference

InferactMillbrae, CAL6 LeadJune 26th, 2026

Computer Systems Engineers/ArchitectsSoftware Publishers

Inferact Inc. is seeking a Performance Engineer in San Francisco, California, focused on optimizing GPU performance for vLLM, the world's AI inference engine. The ideal candidate will have deep experience in CUDA, a strong grasp of GPU architecture, and the ability to write high-performance code in C++ and Python. This role can be remote for exceptional candidates. The compensation range is $200,000 - $400,000 plus equity, with excellent health benefits. #J-18808-Ljbffr

Performance Kernel Engineer for High-Speed GPU Inference

matching similar jobs near Millbrae, CA