Senior Data Scientist
About Probably GeneticProbably Genetic is changing the lives of patients living with severe, complex diseases. Our data platform is used by drug developers and patient advocacy groups to develop and launch treatments for these patients. Our technology discovers undiagnosed patients online, analyzes their disease state using machine learning and at-home testing, and enables compliant communication with patients. In doing so, we help patients access diagnoses, clinical trials, and treatments as early as possible.We are a tight-knit group of hard-working, ambitious problem solvers united by a mission greater than ourselves. We do well by doing right by patients. We are developing some of the most cutting-edge solutions in healthcare, and our roadmap is packed with innovations in bioinformatics, AI, and drug development. We have built a lean, all-star team to help us bring our vision to life, and we want you to be a part of it.Probably Genetic has raised multiple rounds of funding from Silicon Valley’s best investors, including Threshold, Khosla, and Y Combinator, and offer competitive salaries, comprehensive benefits, and meaningful early stage equity.About The RoleWe are looking for a Senior Data Scientist who will own some of the most consequential diagnostic AI in rare disease: building, validating, and operationalizing the models that help us find and diagnose patients who have never had a name for their disease, powering the analytical rigor behind our testing programs, and shaping how we use data to make smarter product decisions.What you will doOwn the end-to-end development, validation, and operationalization of PG's predictive diagnostic AI models — from feature engineering through production deployment – that power program eligibility decisions and clinical decisions for patientsRun prospective testing experiments: apply diagnostic models to undiagnosed patients, coordinate testing, and track outcomes to continuously improve model performanceBuild and maintain PG's synthetic patient data pipeline, a critical deliverable for our research programs, and key input to our own model development lifecycleOptimize our patient intake experience using NLP and multimodal data analysis to determine which questions to ask, in what order, to maximize data quality and conversionOwn API usage and cost optimization across PG's AI stack, including prompt engineering, model evaluation, and ongoing performance monitoringConduct ad hoc strategic analyses that inform product prioritization, causality assessment, and generate customer-facing program insightsEstablish MLOps infrastructure: model monitoring, drift detection, API observability, and lightweight but durable operational processesHave the freedom to conduct blue sky research initiatives aimed at creating value from our dataWork with Data Engineering to build a robust, scalable data foundation that supports all of the aboveWho you areWe are looking for a few specific things that will help you succeed in this role:7+ years of experience in data science, machine learning engineering, or a closely related fieldStrong Python proficiency and fluency across the core data science stack: pandas, NumPy, scikit-learn, PySpark, and SQLDemonstrated end-to-end ML experience: you have taken models from problem definition through feature engineering, validation, deployment, and monitoring in a production environmentExperience with NLP techniques and applying language models to real-world problemsComfort with prompt engineering and evaluating external AI API performance (e.g., OpenAI)A track record of operating with high ownership in lean, fast-moving environments where you have had to build structure as much as execute within itStrong analytical communication skills — you can translate complex model outputs and data findings into clear, actionable narratives for technical and non-technical audiences alikeSome things that are not required, but you will learn on the job:Experience with Databricks or similar lakehouse/ML platform environmentsFamiliarity with synthetic data generation techniquesDomain knowledge in healthcare, rare disease, genomics, or clinical researchExperience with MLOps tooling and building observability infrastructure from scratchExposure to biopharma or insurance analytics use casesWhat We Offer At Probably GeneticAn engaging and supportive team all on a mission to improve livesFair and equitable compensation with competitive early-stage equity grantsGenerous Flexible Time off policy, that we actually useParental Leave Benefits (12 weeks for both birthing and non-birthing)Hybrid, flexible work with high-trust and autonomyA bright, inviting, pet-friendly office in Downtown SF near transitA “work from anywhere” policy, up to 4 weeks a yearRegular team retreats in exciting destinationsHealth Benefits including medical, dental, vision, therapy, FSA, and 401kAnd so much more!Probably Genetic is committed to fostering a welcoming and inclusive work environment for people of all genders, sexuality, ethnicity, socioeconomic background and life experiences. We urge candidates of all backgrounds to apply. If you require specific accommodations as you interview or consider working with us, please let us know.