<Back to Search
Senior ML Performance Engineer
Santa Clara, CAApril 3rd, 2026
Job Description
About UsAt Lemurian Labs, we're reimagining the foundations of computing to make AI accessible to everyone. Our mission is to remove the limits of scale, hardware, and cost that hold back innovation, so the people solving humanity's hardest problems can move faster.We're building a new kind of software stack: a hardware-agnostic platform that makes every system — from a laptop to a supercomputer — feel like one seamless engine. Developers can write once, run anywhere, and get state-of-the-art performance across any chip, any cloud, at any scale. It's a complete rethink of how software and hardware interact — designed for the era beyond Moore's Law.We're not looking for the comfortable or the conventional; we're looking for the bold. The engineers who crave frontier problems, who want to bend the limits of what's possible, who see infrastructure not as a constraint but as a canvas. If you want to build the foundation for the next era of AI and change what humanity can achieve in the process, join us.About the RoleWe're looking for a Senior ML Performance Engineer to architect and lead our Performance Testing Platform from the ground up. You'll be the technical authority on how we measure, validate, and optimize the performance of large language models — including Llama 3.2 70B, DeepSeek, and others — before and after compiler optimization on modern GPU architectures.This is a high-impact role at the intersection of ML systems, GPU architecture, and performance engineering. You'll build the infrastructure that proves our compiler delivers real, measurable value — and you'll work directly with compiler and ML engineers to drive the optimizations that get us there.What You'll DoDesign and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clustersDefine and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracyEstablish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvementsDevelop automated testing pipelines for continuous performance validation across compiler releases and model updatesInvestigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizationsCreate dashboards and reporting that provide clear visibility into performance trends, regressions, and winsCollaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflowDocument best practices for performance testing and optimization of ML workloads on GPU hardwareEssential Skills and Experience:BS degree in computer science, computer engineering, electrical engineering, or equivalent practical experience7+ years of experience in performance engineering, benchmarking, or systems engineering rolesDeep understanding of ML inference workloads, particularly transformer-based models and LLMsHands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)Strong programming skills in Python and C/C++Proven track record of building performance testing infrastructure or benchmarking platforms from scratchExperience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)Proficiency with profiling and debugging tools for GPU workloadsStrong analytical skills with the ability to design experiments, analyze results, and communicate findings clearlyExperience with CI/CD systems and test automation frameworksPreferred Skills and Experience:Masters or PhD degree in computer science, computer engineering, electrical engineering, or equivalent practical experience.Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystemKnowledge of compiler optimization techniques and their impact on performanceExperience with distributed inference and multi-GPU workloadsFamiliarity with ML model quantization, pruning, and other optimization techniquesBackground in high-performance computing or systems-level optimizationExperience with infrastructure-as-code (Kubernetes, Docker, Terraform)Contributions to open-source ML or systems projectsPersonal AttributesPrecision-driven: you catch the 2% regression that others miss.Self-directed: you take ownership and don't wait for permission to solve problems.Collaborative: you work well across teams and actively help others succeed.Clear communicator: you can explain complex technical concepts to engineers and stakeholders alike.Why Join Lemurian LabsBuild the performance testing infrastructure that validates the future of efficient AI.Own a high-visibility platform that directly influences product quality and customer success.Work with cutting-edge GPU hardware and next-generation LLMs.Competitive compensation including equity, medical/dental/vision, retirement savings, and wellness benefits.Lemurian Labs is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of gender identity, race, ethnicity, sexual orientation, disability status, age, or background.Compensation depends on experience and geographic location and will be narrowed during the interview process. Additional benefits include equity, company bonus opportunities, medical, dental, and vision coverage, a retirement savings plan, and supplemental wellness benefits.
595 matching similar jobs near Santa Clara, CA
- Sr Software Engineer
- Senior Staff AI & Automation Engineer - Remote Unlimited PTO
- Manager, AI & Automation
- Senior Forward Deployed Engineer (Backend)
- Senior Engineering Manager, AI & Machine Learning
- Software Engineer (L2/L3-Embedded)
- Senior Application Developer
- Senior Product Manager, AI
- Java ATG Developer: Web Services & Spring Expert
- Senior Generative AI ML Engineer - Graph ML & Big Data
- Senior Engineer, AI/ML Visualization
- AI Engineer
- Senior AI Engineer
- Forward Deployed Engineer
- Staff Software Engineer
- Staff ML Engineer: LLM Fine-Tuning for RTL/Verilog
- AI Senior Staff Systems Engineer
- Senior Lead AI Engineer (AI Foundations, LLM Customization and Finetuning)
- Full-Stack Engineer, Enterprise GenAI
- Lead Customer Facing Applied AI Engineer
- Lead Customer Facing Backend Engineer
- AI Intern
- IC Design AI/ML Silicon Debug & Yield Engineer
- Manager - GenAI Full Stack Developer
- Full Stack Engineer, Lead - Tax Transformation
- Senior Software Engineer- Python Automation
- AI/ML Modeling Engineer II (Intern)- United States
- AI/ML Modeling Engineer II (Intern)- United States
- Azure Solution Architect
- Senior ML Inference Engineer - Distributed Systems & Equity
- Senior Mobile Engineer (Flutter) – Forward Deployment / Customer EngineeringCampbell, CAApril 3rd, 2026
- ML Engineer: LLMs, VLMs & Reasoning AI | Equity
- Senior ML Engineer — Scalable Personalization Pipelines
- Director of ML Engineering — AI/ML Model Compiler & Apps
- Senior System Software Engineer
- Senior Lead AI Engineer (GenAI Platform)
- Distinguished Engineer
- Principal Engineer AI/ML
- Lead Forward Deployed Engineer
- Senior Manager, Forward Deployed AI Engineer