JOBSEARCHER

AI/ML ASIC Architect -- KUMDC5804128 (Milpitas)

CompunnelMilpitas, CAMay 17th, 2026
Job SummaryAs an AI/ML ASIC Architect, you will design advanced system architectures and accelerator specifications for Client next-generation products. This role focuses on developing frontend architectures that integrate AI storage solutions with GPUs, TPUs, and xPUs in 3D package systems. You will collaborate across design, verification, simulation, emulation, and firmware teams to deliver innovative, competitive, and adaptive accelerator solutions that redefine data-centric architectures.Key ResponsibilitiesJob SummaryAs an AI/ML ASIC Architect, you will design advanced system architectures and accelerator specifications for Client next-generation products. This role focuses on developing frontend architectures that integrate AI storage solutions with GPUs, TPUs, and xPUs in 3D package systems. You will collaborate across design, verification, simulation, emulation, and firmware teams to deliver innovative, competitive, and adaptive accelerator solutions that redefine data-centric architectures.Key ResponsibilitiesDrive AI/ML ASIC architecture integrating AI storage with GPU/TPU/xPU accelerators, with emphasis on I/O subsystems over UCIe, PCIe, and CXL.Author clear and concise architecture specifications for AI/ML xPU-based accelerators.Define I/O subsystem and PCIe DMA architectures, including interactions with embedded processors, NoC, memory controllers, and FPGA fabric.Create modular I/O subsystem architectures deployable in chiplet, monolithic, or 3D form factors.Collaborate with customers and cross-functional teams to scope SoC requirements, analyze PPA trade-offs, and define architectural requirements.Guide pre-silicon design/verification and post-silicon validation during execution phases.Improve ASIC architecture performance through hardware/software co-optimization and post-silicon analysis.Conduct workload analysis and characterization of ASICs and competitive AI/datacenter solutions to identify performance improvement opportunities.Architect components such as HBM, PCIe/UCIe/CXL, NoC, DMA, NAND, and fabrics.Drive frontend system architecture to meet or exceed next-generation HBM bandwidth.Architect memory-efficient inference/training systems using pruning, quantization, batching, and speculative decoding.Collaborate with ML researchers and stakeholders to iterate rapidly and disseminate results.Required QualificationsBachelor's, Master's, or Ph.D. in Computer/Electrical Engineering.15+ years of hands-on architecture experience authoring specifications.Strong background in ASIC, SoC, or I/O subsystem architecture involving PCIe/UCIe/CXL and DMA engines.Knowledge of I/O subsystem and DMA interactions with embedded processors (x86, RISC-V, ARM) and host CPUs.Deep understanding of computer/graphics architecture, ML, and LLMsExperience architecting GPU/TPU/xPU systems with optimized high-bandwidth memory hierarchies for multi-trillion parameter LLM training/inference.Expertise in KV cache optimization, Flash Attention, and Mixture of Experts.Strong experience optimizing large-scale ML systems and GPU architectures.Knowledge of ARM processors and AXI interconnects.Preferred Qualifications (if any)Familiarity with UCIe, CXL, NVLink, or UAL microarchitecture and protocols.Experience with high-speed networking (InfiniBand, RDMA, NVLink).Expert knowledge of transformer architectures, attention mechanisms, and model parallelism techniques.Multi-disciplinary experience including firmware and ASIC design.Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations.Proven track record architecting distributed training systems at scale.Experience with NVMe storage systems, protocols, and NAND flash.Certifications (if any)Relevant certifications in ASIC/SoC architecture, AI/ML, or hardware design (preferred but not required).AI/ML ASIC architecture integrating AI storage with GPU/TPU/xPU accelerators, with emphasis on I/O subsystems over UCIe, PCIe, and CXL.Author clear and concise architecture specifications for AI/ML xPU-based accelerators.Define I/O subsystem and PCIe DMA architectures, including interactions with embedded processors, NoC, memory controllers, and FPGA fabric.Create modular I/O subsystem architectures deployable in chiplet, monolithic, or 3D form factors.Collaborate with customers and cross-functional teams to scope SoC requirements, analyze PPA trade-offs, and define architectural requirements.Guide pre-silicon design/verification and post-silicon validation during execution phases.Improve ASIC architecture performance through hardware/software co-optimization and post-silicon analysis.Conduct workload analysis and characterization of ASICs and competitive AI/datacenter solutions to identify performance improvement opportunities.Architect components such as HBM, PCIe/UCIe/CXL, NoC, DMA, NAND, and fabrics.Drive frontend system architecture to meet or exceed next-generation HBM bandwidth.Architect memory-efficient inference/training systems using pruning, quantization, batching, and speculative decoding.Collaborate with ML researchers and stakeholders to iterate rapidly and disseminate results.Required QualificationsBachelor's, Master's, or Ph.D. in Computer/Electrical Engineering.15+ years of hands-on architecture experience authoring specifications.Strong background in ASIC, SoC, or I/O subsystem architecture involving PCIe/UCIe/CXL and DMA engines.Knowledge of I/O subsystem and DMA interactions with embedded processors (x86, RISC-V, ARM) and host CPUs.Deep understanding of computer/graphics architecture, ML, and LLMsExperience architecting GPU/TPU/xPU systems with optimized high-bandwidth memory hierarchies for multi-trillion parameter LLM training/inference.Expertise in KV cache optimization, Flash Attention, and Mixture of Experts.Strong experience optimizing large-scale ML systems and GPU architectures.Knowledge of ARM processors and AXI interconnects.Preferred Qualifications (if any)Familiarity with UCIe, CXL, NVLink, or UAL microarchitecture and protocols.Experience with high-speed networking (InfiniBand, RDMA, NVLink).Expert knowledge of transformer architectures, attention mechanisms, and model parallelism techniques.Multi-disciplinary experience including firmware and ASIC design.Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations.Proven track record architecting distributed training systems at scale.Experience with NVMe storage systems, protocols, and NAND flash.Certifications (if any)Relevant certifications in ASIC/SoC architecture, AI/ML, or hardware design (preferred but not required).