{"schemaVersion":"jobsearcher.job.v1","id":"6e5a14e03cb06afd9b9e830b","url":"https://jobsearcher.com/jobs/6e5a14e03cb06afd9b9e830b","canonicalUrl":"https://jobsearcher.com/jobs/6e5a14e03cb06afd9b9e830b","title":"Data Architect, Data Foundry","description":"At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We're looking for people who are determined to make life better for people around the world.\r\nPosition: Data Architect, Data Foundry\r\nLocation: San Diego, CA; San Francisco, CA; Boston, MA; Louisville, CO; Indianapolis, IN\r\nOverview\r\nLilly Small Molecule Discovery is purpose-built to create molecules that make life better for people. Discovery Technology and Platforms (DTP) accelerates molecule discovery by building optimized foundational platforms, streamlining lab operations through advanced technologies and data connectivity, and investing in novel capabilities.\r\nData Foundry is a multidisciplinary team within DTP that enables AI-native drug discovery through four integrated pillars: Architecture4Insight (data infrastructure and scientific software), Methods4Insight (analytical and computational methods), Automation & Scale4Insight (lab automation and agentic workflows), and Preparedness4Insight (data governance and readiness). These pillars empower every Lilly scientist to make optimal decisions by providing seamless access to data, insights, and AI-driven capabilities—serving both human scientists and autonomous AI agents.\r\nPosition Summary\r\nWe are seeking Data Architects at multiple levels to design and build the data infrastructure that makes AI-native drug discovery possible. You will create the schemas, ontologies, data models, knowledge graphs, and platform architectures that transform raw scientific data into machine-actionable, FAIR-compliant, insight-ready assets—serving both discovery scientists and autonomous AI agents.\r\nThis role is the foundation of Architecture4Insight . Everything the software engineering team builds—pipelines, APIs, prototypes—depends on the data models and platform architecture this team designs. You will work with deep knowledge of scientific data (chemical, biological, HTE, automation-generated) to create custom-fit solutions, then partner with Tech@Lilly to scale and maintain them. The role spans three focus areas depending on expertise: data modeling & ontologies , data platform & lakehouse architecture , and knowledge graph & specialized data systems .\r\nResponsibilities\r\nData Modeling & Ontologies\r\nDesign and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio.\r\nDefine and maintain controlled vocabularies, metadata standards, and FAIR-compliant data frameworks in partnership with Preparedness4Insight.\r\nImplement semantic data standards (RDF, OWL, SPARQL) and ontology engineering practices to create interoperable, machine-readable scientific data.\r\nData Platform & Lakehouse Architecture\r\nDesign and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization.\r\nBuild and optimize ETL/ELT pipelines using Spark, dbt, or similar tools to transform raw scientific data into analytical and ML-ready formats.\r\nImplement real-time and streaming data integration (Kafka, Kinesis, event-driven patterns) connecting LIMS, instruments, and lab automation systems to the data infrastructure.\r\nKnowledge Graph & Specialized Data Systems\r\nDesign and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape.\r\nArchitect specialized data solutions: array databases (TileDB) for genomics/imaging, document stores (MongoDB) for experimental records, and vector databases for embedding-based retrieval supporting ML and RAG workflows.\r\nBuild query and traversal patterns that enable scientists and AI agents to ask relational questions across the entire data landscape.\r\nCross-Functional Partnership\r\nPartner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented.\r\nCollaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation.\r\nWork with Tech@Lilly to define scaling strategies, ensure enterprise compliance, and transition data architectures to production-grade management.\r\nContribute to build-versus-buy-versus-adopt decisions by evaluating commercial and open-source data platforms against Data Foundry requirements.\r\nBasic Requirements\r\nB.S. or M.S. in Computer Science, Data Science, Bioinformatics, Computational Biology, Information Science, or related STEM field; Ph.D. valued for ontology and knowledge graph roles.\r\nB.S. with 7+ years and M.S. with 5+ years of data architecture, data engineering, or scientific informatics experience.\r\nSQL skills and experience in multiple database paradigms (relational, graph, document, columnar, key-value).\r\nQualified applicants must be authorized to work in the United States on a full-time basis. Lilly will not provide support for or sponsor work authorization or visas for this role, including but not limited to F-1 CPT, F-1 OPT, F-1 STEM OPT, J-1, H-1B, TN, O-1, E-3, H-1B1, or L-1.\r\nPreferred Qualifications\r\nExpertise in at least one of: data modeling/ontologies, data platform engineering (Databricks, Snowflake, Spark), or graph/specialized databases (Neo4j, Neptune, MongoDB).\r\nFamiliarity with cloud platforms (AWS, Azure, or GCP) and modern data integration patterns.\r\nUnderstanding of scientific data types and experimental workflows in life sciences or pharma (chemical, biological, HTE data).\r\nStrong communication skills with ability to translate data architecture concepts for both technical and scientific audiences.\r\nPharmaceutical or biotech research industry experience, particularly in discovery data management or research informatics.\r\nExperience with semantic web technologies: RDF, OWL, SPARQL, Protégé, or equivalent ontology engineering tools.\r\nHands-on experience with graph databases (Neo4j, Neptune, TigerGraph) and knowledge graph design patterns for scientific data.\r\nData lakehouse architecture experience: Databricks (Delta Lake, Unity Catalog), Snowflake, or equivalent; ETL/ELT with Spark, dbt.\r\nExperience with streaming/real-time data platforms (Kafka, Kinesis, Flink) and event-driven architectures.\r\nFamiliarity with LIMS, ELN systems (e.g., Benchling), and laboratory instrument data integration.\r\nExperience with vector databases (Pinecone, Weaviate, pgvector) and embedding-based retrieval for ML/RAG applications.\r\nArray database experience (TileDB, Zarr) for genomics, imaging, or high-dimensional scientific data.\r\nExperience with bioinformatics data formats (FASTA, BAM/CRAM, VCF) and biological sequence databases; familiarity with NGS data pipelines and proteomics data management.\r\nFAIR data principles implementation experience and Data Readiness Level frameworks.\r\nScientific data standards and controlled vocabularies in chemistry (InChI, SMILES) or biology (Gene Ontology, UniProt, pathway databases such as Reactome or KEGG).\r\nLilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form https://careers.lilly.com/us/en/workplace-accommodation for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.\r\nLilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.\r\nOur employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.\r\nActual compensation will depend on a candidate's education, experience, skills, and geographic location. The anticipated wage for this position is $132,000 - $193,600. Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company-sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities). Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly's compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.\r\nWeAreLilly\r\nJ-18808-Ljbffr","company":"Initial Therapeutics","rawCompany":"initial therapeutics","city":"Millbrae","state":"CA","isRemote":false,"isActive":false,"createdAt":"2026-06-11T13:55:05.903Z","occupations":[{"code":"15-1243.01","title":"Data Warehousing Specialists","slug":"data-warehousing-specialists"},{"code":"15-1243.00","title":"Database Architects","slug":"database-architects"},{"code":"15-2051.00","title":"Data Scientists","slug":"data-scientists"}],"industries":[{"code":"541512","title":"Computer Systems Design Services","slug":"computer-systems-design-services"},{"code":"513210","title":"Software Publishers","slug":"software-publishers"},{"code":"541511","title":"Custom Computer Programming Services","slug":"custom-computer-programming-services"}],"jobPosting":{"@context":"https://schema.org","@type":"JobPosting","title":"Data Architect, Data Foundry","description":"At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We're looking for people who are determined to make life better for people around the world.\r\nPosition: Data Architect, Data Foundry\r\nLocation: San Diego, CA; San Francisco, CA; Boston, MA; Louisville, CO; Indianapolis, IN\r\nOverview\r\nLilly Small Molecule Discovery is purpose-built to create molecules that make life better for people. Discovery Technology and Platforms (DTP) accelerates molecule discovery by building optimized foundational platforms, streamlining lab operations through advanced technologies and data connectivity, and investing in novel capabilities.\r\nData Foundry is a multidisciplinary team within DTP that enables AI-native drug discovery through four integrated pillars: Architecture4Insight (data infrastructure and scientific software), Methods4Insight (analytical and computational methods), Automation & Scale4Insight (lab automation and agentic workflows), and Preparedness4Insight (data governance and readiness). These pillars empower every Lilly scientist to make optimal decisions by providing seamless access to data, insights, and AI-driven capabilities—serving both human scientists and autonomous AI agents.\r\nPosition Summary\r\nWe are seeking Data Architects at multiple levels to design and build the data infrastructure that makes AI-native drug discovery possible. You will create the schemas, ontologies, data models, knowledge graphs, and platform architectures that transform raw scientific data into machine-actionable, FAIR-compliant, insight-ready assets—serving both discovery scientists and autonomous AI agents.\r\nThis role is the foundation of Architecture4Insight . Everything the software engineering team builds—pipelines, APIs, prototypes—depends on the data models and platform architecture this team designs. You will work with deep knowledge of scientific data (chemical, biological, HTE, automation-generated) to create custom-fit solutions, then partner with Tech@Lilly to scale and maintain them. The role spans three focus areas depending on expertise: data modeling & ontologies , data platform & lakehouse architecture , and knowledge graph & specialized data systems .\r\nResponsibilities\r\nData Modeling & Ontologies\r\nDesign and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio.\r\nDefine and maintain controlled vocabularies, metadata standards, and FAIR-compliant data frameworks in partnership with Preparedness4Insight.\r\nImplement semantic data standards (RDF, OWL, SPARQL) and ontology engineering practices to create interoperable, machine-readable scientific data.\r\nData Platform & Lakehouse Architecture\r\nDesign and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization.\r\nBuild and optimize ETL/ELT pipelines using Spark, dbt, or similar tools to transform raw scientific data into analytical and ML-ready formats.\r\nImplement real-time and streaming data integration (Kafka, Kinesis, event-driven patterns) connecting LIMS, instruments, and lab automation systems to the data infrastructure.\r\nKnowledge Graph & Specialized Data Systems\r\nDesign and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape.\r\nArchitect specialized data solutions: array databases (TileDB) for genomics/imaging, document stores (MongoDB) for experimental records, and vector databases for embedding-based retrieval supporting ML and RAG workflows.\r\nBuild query and traversal patterns that enable scientists and AI agents to ask relational questions across the entire data landscape.\r\nCross-Functional Partnership\r\nPartner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented.\r\nCollaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation.\r\nWork with Tech@Lilly to define scaling strategies, ensure enterprise compliance, and transition data architectures to production-grade management.\r\nContribute to build-versus-buy-versus-adopt decisions by evaluating commercial and open-source data platforms against Data Foundry requirements.\r\nBasic Requirements\r\nB.S. or M.S. in Computer Science, Data Science, Bioinformatics, Computational Biology, Information Science, or related STEM field; Ph.D. valued for ontology and knowledge graph roles.\r\nB.S. with 7+ years and M.S. with 5+ years of data architecture, data engineering, or scientific informatics experience.\r\nSQL skills and experience in multiple database paradigms (relational, graph, document, columnar, key-value).\r\nQualified applicants must be authorized to work in the United States on a full-time basis. Lilly will not provide support for or sponsor work authorization or visas for this role, including but not limited to F-1 CPT, F-1 OPT, F-1 STEM OPT, J-1, H-1B, TN, O-1, E-3, H-1B1, or L-1.\r\nPreferred Qualifications\r\nExpertise in at least one of: data modeling/ontologies, data platform engineering (Databricks, Snowflake, Spark), or graph/specialized databases (Neo4j, Neptune, MongoDB).\r\nFamiliarity with cloud platforms (AWS, Azure, or GCP) and modern data integration patterns.\r\nUnderstanding of scientific data types and experimental workflows in life sciences or pharma (chemical, biological, HTE data).\r\nStrong communication skills with ability to translate data architecture concepts for both technical and scientific audiences.\r\nPharmaceutical or biotech research industry experience, particularly in discovery data management or research informatics.\r\nExperience with semantic web technologies: RDF, OWL, SPARQL, Protégé, or equivalent ontology engineering tools.\r\nHands-on experience with graph databases (Neo4j, Neptune, TigerGraph) and knowledge graph design patterns for scientific data.\r\nData lakehouse architecture experience: Databricks (Delta Lake, Unity Catalog), Snowflake, or equivalent; ETL/ELT with Spark, dbt.\r\nExperience with streaming/real-time data platforms (Kafka, Kinesis, Flink) and event-driven architectures.\r\nFamiliarity with LIMS, ELN systems (e.g., Benchling), and laboratory instrument data integration.\r\nExperience with vector databases (Pinecone, Weaviate, pgvector) and embedding-based retrieval for ML/RAG applications.\r\nArray database experience (TileDB, Zarr) for genomics, imaging, or high-dimensional scientific data.\r\nExperience with bioinformatics data formats (FASTA, BAM/CRAM, VCF) and biological sequence databases; familiarity with NGS data pipelines and proteomics data management.\r\nFAIR data principles implementation experience and Data Readiness Level frameworks.\r\nScientific data standards and controlled vocabularies in chemistry (InChI, SMILES) or biology (Gene Ontology, UniProt, pathway databases such as Reactome or KEGG).\r\nLilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form https://careers.lilly.com/us/en/workplace-accommodation for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.\r\nLilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.\r\nOur employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.\r\nActual compensation will depend on a candidate's education, experience, skills, and geographic location. The anticipated wage for this position is $132,000 - $193,600. Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company-sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities). Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly's compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.\r\nWeAreLilly\r\nJ-18808-Ljbffr","datePosted":"2026-06-11T13:55:05.903Z","dateModified":"2026-06-11T13:55:05.903Z","hiringOrganization":{"@type":"Organization","name":"Initial Therapeutics","sameAs":"https://jobsearcher.com"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Millbrae","addressRegion":"CA","addressCountry":"US"}},"identifier":{"@type":"PropertyValue","name":"JobSearcher","value":"6e5a14e03cb06afd9b9e830b"},"url":"https://jobsearcher.com/jobs/6e5a14e03cb06afd9b9e830b"}}