Senior ML Performance Engineer

software quality assurance analysts and testers

continuing care retirement communities and assisted living facilities for the elderly other electrical equipment and component manufacturing paint coating and adhesive manufacturing natural gas distribution other transportation equipment manufacturing

Santa Clara, CA

April 3rd, 2026

Job Description About UsAt Lemurian Labs, we're reimagining the foundations of computing to make AI accessible to everyone. Our mission is to remove the limits of scale, hardware, and cost that hold back innovation, so the people solving humanity's hardest problems can move faster.We're building a new kind of software stack: a hardware-agnostic platform that makes every system — from a laptop to a supercomputer — feel like one seamless engine. Developers can write once, run anywhere, and get state-of-the-art performance across any chip, any cloud, at any scale. It's a complete rethink of how software and hardware interact — designed for the era beyond Moore's Law.We're not looking for the comfortable or the conventional; we're looking for the bold. The engineers who crave frontier problems, who want to bend the limits of what's possible, who see infrastructure not as a constraint but as a canvas. If you want to build the foundation for the next era of AI and change what humanity can achieve in the process, join us.About the RoleWe're looking for a Senior ML Performance Engineer to architect and lead our Performance Testing Platform from the ground up. You'll be the technical authority on how we measure, validate, and optimize the performance of large language models — including Llama 3.2 70B, DeepSeek, and others — before and after compiler optimization on modern GPU architectures.This is a high-impact role at the intersection of ML systems, GPU architecture, and performance engineering. You'll build the infrastructure that proves our compiler delivers real, measurable value — and you'll work directly with compiler and ML engineers to drive the optimizations that get us there.What You'll DoDesign and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clustersDefine and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracyEstablish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvementsDevelop automated testing pipelines for continuous performance validation across compiler releases and model updatesInvestigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizationsCreate dashboards and reporting that provide clear visibility into performance trends, regressions, and winsCollaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflowDocument best practices for performance testing and optimization of ML workloads on GPU hardwareEssential Skills and Experience:BS degree in computer science, computer engineering, electrical engineering, or equivalent practical experience7+ years of experience in performance engineering, benchmarking, or systems engineering rolesDeep understanding of ML inference workloads, particularly transformer-based models and LLMsHands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)Strong programming skills in Python and C/C++Proven track record of building performance testing infrastructure or benchmarking platforms from scratchExperience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)Proficiency with profiling and debugging tools for GPU workloadsStrong analytical skills with the ability to design experiments, analyze results, and communicate findings clearlyExperience with CI/CD systems and test automation frameworksPreferred Skills and Experience:Masters or PhD degree in computer science, computer engineering, electrical engineering, or equivalent practical experience.Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystemKnowledge of compiler optimization techniques and their impact on performanceExperience with distributed inference and multi-GPU workloadsFamiliarity with ML model quantization, pruning, and other optimization techniquesBackground in high-performance computing or systems-level optimizationExperience with infrastructure-as-code (Kubernetes, Docker, Terraform)Contributions to open-source ML or systems projectsPersonal AttributesPrecision-driven: you catch the 2% regression that others miss.Self-directed: you take ownership and don't wait for permission to solve problems.Collaborative: you work well across teams and actively help others succeed.Clear communicator: you can explain complex technical concepts to engineers and stakeholders alike.Why Join Lemurian LabsBuild the performance testing infrastructure that validates the future of efficient AI.Own a high-visibility platform that directly influences product quality and customer success.Work with cutting-edge GPU hardware and next-generation LLMs.Competitive compensation including equity, medical/dental/vision, retirement savings, and wellness benefits.Lemurian Labs is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of gender identity, race, ethnicity, sexual orientation, disability status, age, or background.Compensation depends on experience and geographic location and will be narrowed during the interview process. Additional benefits include equity, company bonus opportunities, medical, dental, and vision coverage, a retirement savings plan, and supplemental wellness benefits.

595 matching similar jobs near Santa Clara, CA

1 2 3 4 12