Senior Product Manager
Senior Product Manager: AI Platform & Managed Services ($600mill in funding) $250K + Equity San Francisco, Bay Area - Onsite 5 days We are building a next-generation AI platform that enables developers and enterprises to train, deploy, and operate large-scale machine learning systems across heterogeneous compute environments.Our focus is on making cutting-edge AI infrastructure usable, scalable, and efficient - abstracting away the complexity of distributed systems, multi-GPU orchestration, and model lifecycle management into a cohesive, developer-first platform.This is a high-impact role at the intersection of AI infrastructure, distributed systems, and developer platforms, where you will define and build the product layer that sits between raw compute and real-world AI applications.You will lead product across key areas of our AI platform and managed services stack, including:Inference & Model Serving PlatformsDesign systems for high-throughput, low-latency inference across LLMs, diffusion models, and multimodal workloadsDefine abstractions for batching, scheduling, caching, and model optimization (quantization, compilation, etc.)Balance performance, cost, and reliability across diverse workloadsAI Platform & Developer ExperienceBuild APIs, SDKs, and workflows that enable developers to go from model ? production seamlesslyDefine primitives for fine-tuning, evaluation, deployment, and observabilitySimplify complex infrastructure into intuitive, composable building blocksMulti-Cluster / Multi-Vendor Compute OrchestrationWork on scheduling and workload placement across heterogeneous environments (GPU/CPU, multi-region, multi-cloud)Partner with engineering on resource allocation, queuing systems, and capacity-aware schedulingObservability, Evaluation & Cost GovernanceDefine telemetry systems for model performance, latency, token usage, and failure modesBuild evaluation workflows for LLM quality, safety, and regression detectionIntroduce cost controls and optimization strategies for large-scale inference and trainingManaged AI ServicesPackage infrastructure into opinionated, production-ready services for enterprise customersDefine SLAs, reliability models, and deployment patterns for mission-critical workloadsWork closely with customers to understand real-world constraints and translate them into product capabilitiesWe're looking for product leaders who can operate at depth across both systems and product, and who are excited about building the foundation for the next generation of AI applications.You likely have:Experience building AI/ML platforms, inference systems, or developer-facing infrastructureStrong understanding of distributed systems, cloud infrastructure, and performance trade-offsFamiliarity with modern AI stacks:LLMs, transformers, diffusion modelsFrameworks like PyTorch, TensorRT, ONNX, vLLM, Triton, etc.AI is undergoing a platform shift. The gap between raw infrastructure and usable systems is still enormous. This role is about closing that gap: turning fragmented, complex infrastructure into a coherent platform that developers can rely on to build real products.You'll be working on problems like:How to make LLM inference predictable and cost-efficient at scaleHow to expose the right abstractions for agentic workflowsHow to manage heterogeneous compute without leaking complexity to usersHow to make AI systems observable, debuggable, and reliableIf you're excited about building the systems that power the next generation of AI applications, apply now!