AI/ML Scientist, Protein Foundation Models
Manifold Bio is a platform biotechnology company pioneering AI-guided protein design and massively multiplexed in vivo screening to unlock tissue-targeted medicines and organism-scale models of living systems. Using proprietary molecular barcoding technology, we screen hundreds of thousands of protein designs simultaneously in living systems, producing in vivo-validated datasets at a scale no one else can match. The datasets power our computational models, which leads to better drug designs, creating a flywheel that gets stronger with every campaign. Our team of protein engineers, biologists, and computational scientists works across this full stack to pursue programs both internally and with leading pharma companies.PositionManifold's AI team is actively training protein foundation models on our proprietary experimental datasets. Our generative antibody design model, mBER, has already demonstrated controllable de novo binder design across multiple million-scale screening campaigns, and the team is now scaling foundation model capabilities to push well beyond current performance. We are looking for an AI/ML Scientist to join this effort. You will work alongside our existing model training team to accelerate the development of foundation models fine-tuned on Manifold's data, bringing additional depth in pre-training methodology, architecture development, and large-scale training. Your work will directly improve mBER's design capabilities and unlock new modeling paradigms for the broader team. You'll own foundation model projects end-to-end, from architecture selection and training infrastructure to evaluation against real experimental outcomes, while contributing to the team's shared research agenda.This is an on-site role and can be based in either Boston, Massachusetts or San Francisco, California. Please only apply if you reside in these cities or are open to relocate.ResponsibilitiesAdvance the team's ongoing foundation model training efforts-pretraining, fine-tuning, and evaluating folding, docking, language, and generative design models on Manifold's proprietary experimental dataBring depth in training methodology, architecture selection, and optimization to complement the existing team's expertiseDevelop and scale training pipelines for distributed, multi-GPU and multi-node training runsIntegrate foundation model outputs into mBER to improve binder design success rates and enable new design capabilitiesDesign and execute ML experiments with clear hypotheses, rigorous evaluation frameworks, and systematic analysisEstablish best practices for mixed-precision training, gradient checkpointing, and computational efficiency at scaleProduce clear documentation and analysis supporting architecture and training decisionsRequired QualificationsDemonstrated experience pretraining and/or fine-tuning protein foundation models (folding, docking, language models, or generative design) with published or otherwise demonstrable resultsStrong familiarity with AlphaFold architecture and training methodology2+ years of hands-on experience with PyTorch and/or JAX for deep learningExperience with large-scale model training: distributed training, multi-GPU/multi-node setups, mixed precision, gradient checkpointingSolid understanding of deep learning architectures (transformers, attention mechanisms, diffusion/flow matching) and optimization techniquesExperience working with protein structure data (PDB, mmCIF) and/or protein sequence datasetsStrong statistical analysis and experimental design skillsProficiency in Python scientific computing stack (NumPy, Pandas, scikit-learn)Self-directed researcher who can balance guidance with independenceExcellent written and verbal communication skills for cross-functional collaborationPreferred QualificationsExperience with protein generative design methods (e.g., RFdiffusion, ProteinMPNN, flow matching approaches)Experience with protein language models (e.g., ESM family)Published research in computational biology, protein design, or structural biologyExperience training on proprietary or domain-specific biological datasetsFamiliarity with Ray for distributed computingExperience with Kubernetes (EKS) and cloud computing platforms (AWS)Knowledge of protein engineering, directed evolution, or structural biology wet lab techniquesExperience working with agentic AI coding tools for fast, parallelized execution of modeling experimentsPrevious biotech/pharma industry experienceThis Role Might Be Perfect For You If:You have deep experience training protein foundation models and want to apply that expertise to some of the richest proprietary experimental datasets in the fieldYou're excited about pushing beyond public model performance by leveraging unique, large-scale in vivo screening dataYou thrive in high-ownership roles where you can drive research direction while collaborating with a tight-knit, world-class teamYou want your models to directly impact real drug discovery programsIf you're excited to train the next generation of protein foundation models on uniquely powerful experimental data, please reach out to careers@manifold.bio.Base Salary Range: $140,000-225,000This reflects the typical offer range for this role, based on experience, role scope, and internal equity. Final compensation decisions are made using a consistent leveling framework and consider the candidate's experience, interview performance, and expected impact.This role is eligible for:Annual performance-based target bonusStock optionsComprehensive medical, dental, and vision coverage401(k) planFlexible paid time off and holidaysPerks including on-site gym, onsite lunch, and commuter supportOur compensation ranges are reviewed annually to ensure alignment with market trends and internal equity.We value different experiences and ways of thinking and believe the most talented teams are built by bringing together people of diverse cultures, genders, and backgrounds.