JOBSEARCHER

GPU Software Engineer

Role: GPU Software EngineerLocation: San Jose, CA – OnsiteDuration: 12+ Months/Contract-to-HireOverview: The Client is seeking an experienced GPU Software Engineer for a 12-month milestone-based engagement supporting a cutting-edge GPU software integration project. The consultant will work on AMD GPU platforms, drive AI stack development, contribute to open-source projects, and deliver performance benchmarking and integration reports across a structured set of monthly deliverables. This is a highly technical, hands-on role requiring deep expertise in GPU software stacks, ROCm, AI frameworks, and systems-level integration.Position Details:Project Title: GPU SW Integration for Samsung CognosEngagement Type: Contract / Milestone-Based (12 Months)Client Environment: AMD MI210 GPU, CXL Memory, NVMe Gen6, ROCm StackDelivery Tools: Confluence, Jira, GitHub/GitLab (client-provided)Key Responsibilities:Design and develop GPU software modules aligned with project milestones.Perform systems integration and end-to-end testing of AI stack SW modules.Validate AMD Infinity Bridge and AIS on MI210 GPU hardware.Conduct functional and performance benchmarking (pSLC Firmware, CXL, ROCm).Implement and validate SGLang changes for L3 to L1 memory transfer optimization.Develop and contribute CaMa module changes to the ROCm software stack.Collaborate with the SGLang open-source community and contribute code to their public GitHub repo.Develop CaMa module for ROCm over Infinity Fabric/Ethernet.Perform E2E performance benchmarking and publish formal benchmarking reports.Integrate CaMa changes into the Cognos AI stack and publish integration documentation.Scope UALink support for CaMa and publish an investigation/feasibility document.Maintain all documentation, code, and status updates in Confluence, Jira, and GitHub/GitLab.Required Skills and Qualifications:GPU Software and HardwareHands-on experience with AMD GPU platforms, specifically MI210.Proficiency with AMD ROCm software stack including kernel libraries and drivers.Experience with AMD Infinity Bridge / Infinity Fabric architecture.Familiarity with CXL (Compute Express Link) memory integration.Experience with NVMe storage and GPU Direct Storage (GDS).AI Frameworks and Software StackExperience with SGLang or similar LLM inference frameworks.Familiarity with AI stack installation and end-to-end workload benchmarking.Knowledge of GPU memory hierarchy (HBM, L1/L3 cache) and data transfer optimization.Proficiency in GPU kernel programming and library management (e.g., GDS, CaMa).Programming and ToolsStrong proficiency in C/C++ and Python for GPU/systems-level development.Experience with open-source contribution workflows (GitHub, pull requests, code reviews).Familiarity with Jira and Confluence for project management and documentation.Experience with pSLC firmware validation and performance benchmarking methodologies.Soft SkillsAbility to work independently and deliver against defined monthly milestones.Strong written communication skills for publishing technical reports and documentation.Collaborative mindset; ability to work with third-party teams (AMD, SGLang community).Preferred Qualifications:Prior experience with Samsung Cognos AI stack or similar enterprise AI platforms.Familiarity with UALink protocol and its GPU interconnect applications.Prior open-source contributions to ROCm, SGLang, or similar GPU frameworks.Experience presenting benchmarking results to semiconductor partners (AMD, NVIDIA, etc.).

matching similar jobs near San Jose, CA

VIEW MORE