JOBSEARCHER

Performance Kernel Engineer for High-Speed GPU Inference

ARCHIVED
InferactMillbrae, CAJune 26th, 2026

We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.

Inferact Inc. is seeking a Performance Engineer in San Francisco, California, focused on optimizing GPU performance for vLLM, the world's AI inference engine. The ideal candidate will have deep experience in CUDA, a strong grasp of GPU architecture, and the ability to write high-performance code in C++ and Python. This role can be remote for exceptional candidates. The compensation range is $200,000 - $400,000 plus equity, with excellent health benefits. #J-18808-Ljbffr