Senior Product Manager
Senior Product Manager: AI Platform & Managed Services ($600mill in funding)
$250K + Equity
San Francisco, Bay Area - Onsite 5 days
We are building a next-generation AI platform that enables developers and enterprises to train, deploy, and operate large-scale machine learning systems across heterogeneous compute environments.
Our focus is on making cutting-edge AI infrastructure usable, scalable, and efficient - abstracting away the complexity of distributed systems, multi-GPU orchestration, and model lifecycle management into a cohesive, developer-first platform.
This is a high-impact role at the intersection ofAI infrastructure, distributed systems, and developer platforms , where you will define and build the product layer that sits between raw compute and real-world AI applications.
You will lead product across key areas of our AI platform and managed services stack, including:
Inference & Model Serving Platforms
Design systems for high-throughput, low-latency inference across LLMs, diffusion models, and multimodal workloads
Define abstractions for batching, scheduling, caching, and model optimization (quantization, compilation, etc.)
Balance performance, cost, and reliability across diverse workloads
AI Platform & Developer Experience
Build APIs, SDKs, and workflows that enable developers to go from model ? production seamlessly
Define primitives for fine-tuning, evaluation, deployment, and observability
Simplify complex infrastructure into intuitive, composable building blocks
Multi-Cluster / Multi-Vendor Compute Orchestration
Work on scheduling and workload placement across heterogeneous environments (GPU/CPU, multi-region, multi-cloud)
Partner with engineering on resource allocation, queuing systems, and capacity-aware scheduling
Observability, Evaluation & Cost Governance
Define telemetry systems for model performance, latency, token usage, and failure modes
Build evaluation workflows for LLM quality, safety, and regression detection
Introduce cost controls and optimization strategies for large-scale inference and training
Managed AI Services
Package infrastructure into opinionated, production-ready services for enterprise customers
Define SLAs, reliability models, and deployment patterns for mission-critical workloads
Work closely with customers to understand real-world constraints and translate them into product capabilities
Were looking for product leaders who can operate at depth across bothsystems and product , and who are excited about building the foundation for the next generation of AI applications.
You likely have:
Experience buildingAI/ML platforms, inference systems, or developer-facing infrastructure
Strong understanding ofdistributed systems, cloud infrastructure, and performance trade-offs
Familiarity with modern AI stacks:
LLMs, transformers, diffusion models
Frameworks like PyTorch, TensorRT, ONNX, vLLM, Triton, etc.
AI is undergoing a platform shift. The gap between raw infrastructure and usable systems is still enormous. This role is about closing that gap: turning fragmented, complex infrastructure into a coherent platform that developers can rely on to build real products.
Youll be working on problems like:
How to make LLM inference predictable and cost-efficient at scale
How to expose the right abstractions for agentic workflows
How to manage heterogeneous compute without leaking complexity to users
How to make AI systems observable, debuggable, and reliable
If youre excited about building the systems that power the next generation of AI applications, apply now!