{"schemaVersion":"jobsearcher.job.v1","id":"4dee1b68b74ded427a3f08f2","url":"https://jobsearcher.com/jobs/4dee1b68b74ded427a3f08f2","canonicalUrl":"https://jobsearcher.com/jobs/4dee1b68b74ded427a3f08f2","title":"Applied ML Engineer","description":"About KnowtexKnowtex is building the future of voice AI operating systems for clinicians, transforming how healthcare documentation happens at the point of care. Founded by Stanford AI scientists with deep clinical experience, we're experiencing explosive growth across both commercial health systems and federal healthcare, with our ambient documentation platform scaling rapidly to thousands of clinicians across hundreds of specialties. We're at an inflection point where cutting-edge AI meets real clinical impact, giving clinicians hours back each day to focus on what matters most - their patients.Position OverviewWe are seeking an Applied ML Engineer to productionize and scale machine learning systems powering our voice AI platform. This role bridges research and engineering — transforming models into reliable, low-latency, production-grade systems deployed across enterprise healthcare environments.You will work closely with ML Scientists, Backend Engineers, and Platform teams to optimize inference performance, build evaluation pipelines, and ensure robust model deployment in regulated environments.Key ResponsibilitiesProductionize ML models for real-time clinical applicationsOptimize inference pipelines for low latency and high throughputDeploy and scale models using AWS-based infrastructureBuild automated evaluation and regression testing frameworks for LLM outputsImplement monitoring systems for model performance and drift detectionCollaborate with Backend teams to integrate ML services into APIs and workflowsImprove model efficiency through quantization, batching, caching, and optimization techniques Support specialty-level model evaluation and performance analysisContribute to CI/CD workflows for ML deploymentRequired Qualifications3–7+ years of experience in machine learning engineering or applied ML rolesStrong proficiency in Python and PyTorch (or TensorFlow)Experience deploying ML models in production environmentsFamiliarity with transformer architectures and large language modelsExperience with model optimization techniques (quantization, distillation, pruning)Experience working with cloud infrastructure (AWS preferred)Strong software engineering fundamentals and debugging skillsPreferred QualificationsExperience with speech recognition systems or NLP pipelinesExperience with Triton Inference Server or similar deployment frameworksFamiliarity with healthcare data or clinical documentation workflowsExperience working in regulated environments (HIPAA, GovCloud, etc.)Knowledge of medical coding systems (ICD-10, CPT)Technical EnvironmentPython, PyTorch / TensorFlowTransformer-based LLM architecturesAWS (SageMaker, ECS, Lambda, S3)Triton Inference ServerCI/CD pipelines for ML deploymentObservability tools for performance and drift monitoringCompensation & BenefitsMeaningful equity compensationUnlimited PTOPremium health, dental, and vision coverage401(k) plan","company":"Knowtex","rawCompany":"knowtex","city":"Millbrae","state":"CA","isRemote":false,"isActive":false,"createdAt":"2026-04-12T18:16:43.325Z","occupations":[{"code":"15-1252.00","title":"Software Developers","slug":"software-developers"},{"code":"15-1299.08","title":"Computer Systems Engineers/Architects","slug":"computer-systems-engineers-architects"},{"code":"15-2051.00","title":"Data Scientists","slug":"data-scientists"}],"industries":[{"code":"541511","title":"Custom Computer Programming Services","slug":"custom-computer-programming-services"},{"code":"541512","title":"Computer Systems Design Services","slug":"computer-systems-design-services"},{"code":"541990","title":"All Other Professional, Scientific, and Technical Services","slug":"all-other-professional-scientific-and-technical-services"}],"jobPosting":{"@context":"https://schema.org","@type":"JobPosting","title":"Applied ML Engineer","description":"About KnowtexKnowtex is building the future of voice AI operating systems for clinicians, transforming how healthcare documentation happens at the point of care. Founded by Stanford AI scientists with deep clinical experience, we're experiencing explosive growth across both commercial health systems and federal healthcare, with our ambient documentation platform scaling rapidly to thousands of clinicians across hundreds of specialties. We're at an inflection point where cutting-edge AI meets real clinical impact, giving clinicians hours back each day to focus on what matters most - their patients.Position OverviewWe are seeking an Applied ML Engineer to productionize and scale machine learning systems powering our voice AI platform. This role bridges research and engineering — transforming models into reliable, low-latency, production-grade systems deployed across enterprise healthcare environments.You will work closely with ML Scientists, Backend Engineers, and Platform teams to optimize inference performance, build evaluation pipelines, and ensure robust model deployment in regulated environments.Key ResponsibilitiesProductionize ML models for real-time clinical applicationsOptimize inference pipelines for low latency and high throughputDeploy and scale models using AWS-based infrastructureBuild automated evaluation and regression testing frameworks for LLM outputsImplement monitoring systems for model performance and drift detectionCollaborate with Backend teams to integrate ML services into APIs and workflowsImprove model efficiency through quantization, batching, caching, and optimization techniques Support specialty-level model evaluation and performance analysisContribute to CI/CD workflows for ML deploymentRequired Qualifications3–7+ years of experience in machine learning engineering or applied ML rolesStrong proficiency in Python and PyTorch (or TensorFlow)Experience deploying ML models in production environmentsFamiliarity with transformer architectures and large language modelsExperience with model optimization techniques (quantization, distillation, pruning)Experience working with cloud infrastructure (AWS preferred)Strong software engineering fundamentals and debugging skillsPreferred QualificationsExperience with speech recognition systems or NLP pipelinesExperience with Triton Inference Server or similar deployment frameworksFamiliarity with healthcare data or clinical documentation workflowsExperience working in regulated environments (HIPAA, GovCloud, etc.)Knowledge of medical coding systems (ICD-10, CPT)Technical EnvironmentPython, PyTorch / TensorFlowTransformer-based LLM architecturesAWS (SageMaker, ECS, Lambda, S3)Triton Inference ServerCI/CD pipelines for ML deploymentObservability tools for performance and drift monitoringCompensation & BenefitsMeaningful equity compensationUnlimited PTOPremium health, dental, and vision coverage401(k) plan","datePosted":"2026-04-12T18:16:43.325Z","dateModified":"2026-04-12T18:16:43.325Z","hiringOrganization":{"@type":"Organization","name":"Knowtex","sameAs":"https://jobsearcher.com"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Millbrae","addressRegion":"CA","addressCountry":"US"}},"identifier":{"@type":"PropertyValue","name":"JobSearcher","value":"4dee1b68b74ded427a3f08f2"},"url":"https://jobsearcher.com/jobs/4dee1b68b74ded427a3f08f2"}}