{"schemaVersion":"jobsearcher.job.v1","id":"c4e3d35fdcb1bd269c52de3d","url":"https://jobsearcher.com/jobs/c4e3d35fdcb1bd269c52de3d","canonicalUrl":"https://jobsearcher.com/jobs/c4e3d35fdcb1bd269c52de3d","title":"Software Engineer, ML Serving","description":"Rime is a foundation modeling company that builds voice AI for enterprises running customer experiences at scale. Our models are purpose-built for high-volume conversational deployments, engineered for the accuracy, performance, and deployment flexibility that production environments actually demand.\r\nWe started from a different premise than the rest of the field: build voice AI for human connection, not slop. Before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech of normal people, recorded and annotated by linguists. It's why our models are unparalleled in naturalism, and it's why enterprises pick Rime when pilots need to make it to production.\r\nRole Overview\r\nWe're hiring a Software Engineer to own the serving infrastructure that connects Rime's inference engines to the world. This role sits at the intersection of ML systems and cloud infrastructure — you'll work directly on model inference and cloud infrastructure to build, harden, and scale the systems that stream voice at real-time latency. As Rime moves toward its next-generation architecture, you'll be a core architect of how our models get served.\r\nWhat You'll Own\r\nArchitecture and implementation of Rime's TTS serving infrastructure, from GPU-backed inference engines to the API surface.\r\nModel optimization from a single-node to disaggregated fleet serving.\r\nCompatibility with different NVIDIA hardwares from Hopper to Blackwell and beyond for on-prem and cloud deployments.\r\nContinuous integration and deployment workflows for the model serving pipeline.\r\nSite reliability: on-call rotation, monitoring, alerting, and observability across the serving stack.\r\nResource provision, cost management across our GPU fleet.\r\nWhat We're Looking For\r\nHands-on experience with real-time multinode ML serving infrastructure — ML serving framework experience: NVIDIA Dynamo/Triton, vLLM, SGLang, or equivalent.\r\nExperience with distributed or disaggregated model serving (Tensor Parallel, Pipeline Parallel, or equivalent).\r\nStrong cloud infrastructure fundamentals: Linux internals, networking, containerization (Docker, Kubernetes).\r\nIaC experience — Terraform, Packer, or comparable. You should have opinions about how to do this right.\r\nOn-call is part of the job. You treat production reliability as a shared responsibility.\r\nNice to Have\r\nExperience with multinode training (DDP, FSDP, etc.).\r\nExperience with gRPC or other bidirectional binary streaming protocols.\r\nExperience with audio streaming and related technologies (WebRTC, WebSockets, etc.).\r\nExperience with a multilingual monorepo where you pick the best language out of merit more than personal experience.\r\nExperience with multi-cloud infrastructures (AWS, GCP, OCI, etc.).\r\nComfort with configuration management tooling (Ansible, Chef, Puppet, or similar).\r\nSRE, DevOps, or platform engineering background at a startup.\r\nExperience at an early-stage company.\r\nWhy Join Rime\r\nBuild the serving infrastructure behind a category-defining voice AI company from the ground up.\r\nYou will bring in experience that no one else currently has at the company: you can help us set the vision.\r\nDirect collaboration with the inference, platform, and ML teams — no handoff culture.\r\nThe systems you build determine what experiences our customers can deploy at scale.\r\nMeaningful equity upside at an early stage.\r\nHigh ownership, high standards, low bureaucracy.\r\nSF / Bay Area.\r\nAt Rime, we...\r\nAre outliers\r\nCut through the hype to focus on the craft\r\nMove fast with agency and freedom\r\nMaintain a growth mindset, finding joy in the struggle\r\nDo the right things, knowing that it'll lead to making money.\r\nIf that sounds like you too, you'll be a great fit for Rime!\r\nJ-18808-Ljbffr","company":"Rime Labs","rawCompany":"rime labs","city":"Millbrae","state":"CA","isRemote":false,"isActive":false,"createdAt":"2026-06-25T01:16:35.744Z","occupations":[{"code":"15-1252.00","title":"Software Developers","slug":"software-developers"},{"code":"15-1299.08","title":"Computer Systems Engineers/Architects","slug":"computer-systems-engineers-architects"},{"code":"15-1221.00","title":"Computer and Information Research Scientists","slug":"computer-and-information-research-scientists"}],"industries":[{"code":"513210","title":"Software Publishers","slug":"software-publishers"},{"code":"541511","title":"Custom Computer Programming Services","slug":"custom-computer-programming-services"},{"code":"541512","title":"Computer Systems Design Services","slug":"computer-systems-design-services"}],"jobPosting":{"@context":"https://schema.org","@type":"JobPosting","title":"Software Engineer, ML Serving","description":"Rime is a foundation modeling company that builds voice AI for enterprises running customer experiences at scale. Our models are purpose-built for high-volume conversational deployments, engineered for the accuracy, performance, and deployment flexibility that production environments actually demand.\r\nWe started from a different premise than the rest of the field: build voice AI for human connection, not slop. Before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech of normal people, recorded and annotated by linguists. It's why our models are unparalleled in naturalism, and it's why enterprises pick Rime when pilots need to make it to production.\r\nRole Overview\r\nWe're hiring a Software Engineer to own the serving infrastructure that connects Rime's inference engines to the world. This role sits at the intersection of ML systems and cloud infrastructure — you'll work directly on model inference and cloud infrastructure to build, harden, and scale the systems that stream voice at real-time latency. As Rime moves toward its next-generation architecture, you'll be a core architect of how our models get served.\r\nWhat You'll Own\r\nArchitecture and implementation of Rime's TTS serving infrastructure, from GPU-backed inference engines to the API surface.\r\nModel optimization from a single-node to disaggregated fleet serving.\r\nCompatibility with different NVIDIA hardwares from Hopper to Blackwell and beyond for on-prem and cloud deployments.\r\nContinuous integration and deployment workflows for the model serving pipeline.\r\nSite reliability: on-call rotation, monitoring, alerting, and observability across the serving stack.\r\nResource provision, cost management across our GPU fleet.\r\nWhat We're Looking For\r\nHands-on experience with real-time multinode ML serving infrastructure — ML serving framework experience: NVIDIA Dynamo/Triton, vLLM, SGLang, or equivalent.\r\nExperience with distributed or disaggregated model serving (Tensor Parallel, Pipeline Parallel, or equivalent).\r\nStrong cloud infrastructure fundamentals: Linux internals, networking, containerization (Docker, Kubernetes).\r\nIaC experience — Terraform, Packer, or comparable. You should have opinions about how to do this right.\r\nOn-call is part of the job. You treat production reliability as a shared responsibility.\r\nNice to Have\r\nExperience with multinode training (DDP, FSDP, etc.).\r\nExperience with gRPC or other bidirectional binary streaming protocols.\r\nExperience with audio streaming and related technologies (WebRTC, WebSockets, etc.).\r\nExperience with a multilingual monorepo where you pick the best language out of merit more than personal experience.\r\nExperience with multi-cloud infrastructures (AWS, GCP, OCI, etc.).\r\nComfort with configuration management tooling (Ansible, Chef, Puppet, or similar).\r\nSRE, DevOps, or platform engineering background at a startup.\r\nExperience at an early-stage company.\r\nWhy Join Rime\r\nBuild the serving infrastructure behind a category-defining voice AI company from the ground up.\r\nYou will bring in experience that no one else currently has at the company: you can help us set the vision.\r\nDirect collaboration with the inference, platform, and ML teams — no handoff culture.\r\nThe systems you build determine what experiences our customers can deploy at scale.\r\nMeaningful equity upside at an early stage.\r\nHigh ownership, high standards, low bureaucracy.\r\nSF / Bay Area.\r\nAt Rime, we...\r\nAre outliers\r\nCut through the hype to focus on the craft\r\nMove fast with agency and freedom\r\nMaintain a growth mindset, finding joy in the struggle\r\nDo the right things, knowing that it'll lead to making money.\r\nIf that sounds like you too, you'll be a great fit for Rime!\r\nJ-18808-Ljbffr","datePosted":"2026-06-25T01:16:35.744Z","dateModified":"2026-06-25T01:16:35.744Z","hiringOrganization":{"@type":"Organization","name":"Rime Labs","sameAs":"https://jobsearcher.com"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Millbrae","addressRegion":"CA","addressCountry":"US"}},"identifier":{"@type":"PropertyValue","name":"JobSearcher","value":"c4e3d35fdcb1bd269c52de3d"},"url":"https://jobsearcher.com/jobs/c4e3d35fdcb1bd269c52de3d"}}