Senior Data Architect, Integrated Data Platform
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Digitech Services, is seeking the following. Apply via Dice today!Senior Data Architect, Integrated Data PlatformLocation: San Francisco Bay Area, CA Contract position Role Overview:Client is seeking a Senior Data Architect to lead the data modeling and platform design for a next-generation Integrated Data Platform (IDP) supporting a regulated medical imaging program at a global pharmaceutical and diagnostics company. This role is responsible for defining the data architecture across relational and lakehouse layers, governing the structure of versioned study-level data packages, and enabling cross-modal data access for imaging, omics, and real-world data. The architect will design for GxP compliance, FAIR data principles, and scalable query performance within a client-managed AWS environment, working in close partnership with the imaging platform, workbench, and clinical data workstreams.Key Responsibilities:Data Modeling and ArchitectureLead the design of the Integrated Data Package (IDP) data model, covering multi-modal study assets including DICOM imaging, omics, and real-world data sourcesDefine the two-layer data architecture: operational relational layer for study metadata, cataloging, and access registry; lakehouse layer for versioned study assets at scaleDesign schemas, partitioning strategies, and table formats across relational (PostgreSQL) and open table format (Apache Iceberg) layers to support both transactional and analytical access patternsEstablish cross-modal patient and study linkage standards, including integration with the Global Unique Patient Record Identifier (GUPRI) and related master data entitiesDefine data versioning and snapshot strategies for study-level packages, enabling reproducible dataset construction for algorithm development and regulatory submissionsLakehouse and Query LayerArchitect the Apache Iceberg-based lakehouse layer on S3, including table design, schema evolution governance, compaction policies, and metadata managementDesign the version catalog architecture using Project Nessie or equivalent catalog tooling, covering namespace structure, branching strategy, and atomic snapshot taggingDefine query access patterns and optimization strategies across the lakehouse layer using distributed SQL query enginesGovern the data access API surface exposed to downstream consumers including the algorithm development workbench and reporting servicesFAIRification and Data GovernanceDesign proactive FAIRification pipelines that enrich incoming study data with standardized metadata, controlled vocabularies, and linkage keys at ingestion timeDefine data quality validation rules, error handling workflows, and observability hooks across the ingestion and enrichment pipelineEstablish data lineage and provenance tracking across the full data lifecycle from ingestion through version snapshot to analytical consumptionEnsure data architecture supports GxP audit trail requirements including ALCOA+ principles for traceability, integrity, and contemporaneityStakeholder Collaboration and GovernanceServe as the primary data architecture authority for the program, partnering with imaging platform, workbench, and regulatory workstreams on cross-cutting data decisionsEngage directly with client data, engineering, and architecture stakeholders to align on data models, access patterns, and governance standardsProduce and maintain architecture artifacts including data models, schema documentation, ADRs, and data dictionaryContribute to milestone delivery planning, technical risk management, and program-level architecture reviewsRequired Qualifications:10+ years of experience in data architecture, data engineering, or enterprise data platform designExpert-level proficiency in relational data modeling (PostgreSQL or equivalent), including schema design, normalization, JSONB/semi-structured patterns, and query optimizationHands-on experience designing and operating modern lakehouse architectures using Apache Iceberg or equivalent open table formats (Delta Lake, Apache Hudi)Strong background in distributed query engines (Presto, Trino, Spark SQL, or equivalent) and large-scale data partitioning strategiesExperience with data versioning concepts including snapshot isolation, time travel, schema evolution, and catalog managementDemonstrated experience delivering data platforms in regulated environments with GxP, 21 CFR Part 11, or equivalent compliance requirementsStrong written and verbal communication skills, with the ability to document data models and architecture decisions for mixed technical and regulatory audiencesNice to Have:Hands-on experience with Project Nessie or equivalent transactional catalog tooling for IcebergBackground in medical imaging data (DICOM) or multi-modal clinical data integration including omics or real-world dataFamiliarity with FAIR data principles and their application to life sciences data platformsExperience with workflow orchestration tools (Apache Airflow, Temporal, or equivalent) in the context of data pipeline designPrior experience in a fixed-fee, milestone-based delivery engagement within a large regulated enterprise environment