<Back to Search
Research Engineer - CUDA Kernel Engineering
Palo Alto, CAApril 3rd, 2026
Job Description
About VoltaiVoltai is developing world models, and agents to learn, evaluate, plan, experiment, and interact with the physical world. We are starting out with understanding and building hardware; electronics systems and semiconductors where AI can design and create beyond human cognitive limits.About the TeamBacked by Silicon Valley's top investors, Stanford University, and CEOs/Presidents of Google, AMD, Broadcom, Marvell, etc. We are a team of previous Stanford professors, SAIL researchers, Olympiad medalists (IPhO, IOI, etc.), CTOs of Synopsys & GlobalFoundries, Head of Sales & CRO of Cadence, former US Secretary of Defense, National Security Advisor, and Senior Foreign-Policy Advisor to four US presidents.About the RoleYou will develop, integrate, and optimize state-of-the-art CUDA kernels to power AI models that accelerate semiconductor design and verification. Your work will enable large-scale model training, inference, and reinforcement learning systems that reason about circuit layouts, generate and validate RTL, and optimize chip architectures — running efficiently across thousands of GPUs.You'll build tools, performance benchmarks, and integration layers that push the limits of GPU utilization for compute-intensive workloads in AI-driven hardware design. Working closely with researchers and engineers, you'll help make Voltai the world's leading AI + semiconductor research organization. You'll also release your kernels and tooling as contributions to the open-source AI and HPC ecosystems.You might thrive in this role if you have experience withWriting and optimizing CUDA kernels for large-scale AI workloads (attention, routing, graph-based operations, physics-inspired operators, etc.)Profiling and optimizing GPU performance for custom compute or memory-bound workloadsIntegrating custom kernels into cutting-edge training and inference frameworks (e.g., PyTorch, Megatron, vLLM, TorchTitan)Working with the latest NVIDIA hardware and software stacks (Hopper, Blackwell, NVLink, NCCL, Triton)Building GPU-accelerated primitives for graph reasoning, symbolic computation, or hardware simulation tasksCollaborating with AI researchers and semiconductor experts to translate domain-specific workloads into high-performance GPU code
504 matching similar jobs near Palo Alto, CA
- Senior Engineer, Equipment Engineering
- Senior Test Engineer (Onsite)
- Manufacturing Technician
- Senior/Staff Machine Learning Engineer - Offline Driving Intelligence
- Senior Algorithm Engineer
- Backend Engineer
- Engineering Technician, Automotive Application
- Machine Learning Engineer
- Machine Learning Engineer, ML Runtime & Optimization
- Engineering Technician
- Senior iOS Engineer
- Principal Manufacturing Engineer
- HPC Engineer
- Senior Frontend Engineer
- Research Intern (Deep Learning), 2026 Spring (Master/PhD)
- Principal Test Engineer - 1861706
- Senior Manufacturing Test Engineer
- Director, Product Engineering — AI Security Platform (Hybrid)
- Sr Product Engineer - Optical Transceivers
- Systems Architect for High-Performance Systems
- Industrialization Engineer – Pile Driving Equipment
- Manufacturing Engineer, Traction Inverter
- Product Development Project Manager
- Lead Data Scientist
- Design Engineer
- Senior Technical Program Manager
- Manufacturing Technician -- SAHDC5777589
- Senior Technical Specialist
- Manufacturing Engineer
- Manufacturing Engineer
- Principal Scientist, Innovation Management, Fremont CA
- RevOps Architect Lead — GTM Systems & Salesforce
- Senior Project Manager (Capital Systems)
- New Product Development Manager
- Project Engineer
- Analyst, AI Center of Excellence (COE)
- Jr. QC Engineer
- Senior AI Systems Engineer
- Principal Design Engineer
- (Agile1)Project Engineer