JOBSEARCHER

AI Performance Engineer

ARCHIVED
GraphcoreMilpitas, CAApril 12th, 2026

We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.

About UsGraphcore is one of the world’s leading innovators in Artificial Intelligence compute.It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.Job SummaryGraphcore’s AI/ML training and inference infrastructure is rapidly scaling to meet the growing demands of AI workloads across mobile, edge, and datacenter environments. This role focuses on optimizing performance across ARM-based architectures and large-scale distributed systems, ensuring efficiency, scalability, and reliability across the full hardware-software stack.The TeamThe System Engineering Performance team architects and optimizes high-performance infrastructure for large-scale datacenter deployments. The team works across hardware, software, networking, and system architecture to deliver cutting-edge AI solutions and ensure optimal system performance at scale.Responsibilities and DutiesAnalyze ML models’ compute and memory requirements using roofline analysis and simulationsCollaborate across hardware and software teams to optimize large-scale AI workloadsBenchmark, monitor, and troubleshoot system performance across distributed systemsOptimize communication stacks including MPI, NCCL, UCX, RDMA, and networking fabricsProfile and optimize AI workloads, focusing on performance bottlenecksDevelop high-quality, ARM-compatible code and documentationCandidate ProfileEssentialBS/MS in Computer Science, Electrical Engineering, or related fieldExperience with distributed systems and communication libraries (MPI, NCCL, UCX, libfabric)Strong programming skills in C++ and PythonExperience profiling and optimizing HPC or AI/ML workloadsFamiliarity with ML benchmarks such as MLPerfDesirableExperience with GPUs or accelerated computing architecturesKnowledge of HPC networking and interconnect technologies (InfiniBand, RoCE)Familiarity with ML frameworks such as PyTorch or TensorFlowUnderstanding of ARM architectures and toolchainsStrong debugging, profiling, and performance optimization skillsIn addition to a competitive salary, Graphcore offers flexible working and a comprehensive benefits package designed to support your health, wellbeing and financial future. Our benefits include medical, dental and vision coverage, Flexible Spending Accounts (FSAs), Health Savings Accounts (HSAs), disability and life insurance, a 401(k) retirement plan, commuter benefits, wellness services and an Employee Assistance Programme (EAP). We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.