Platform Architect
ABOUT USEFULBIUsefulBI is a global AI and data transformation partner helping enterprises turn data and technology into competitive advantage. With 600+ successful projects and AWS Generative AI Competency recognition, we serve Fortune 500 clients across Life Sciences, Financial Services, and Technology.ROLE OVERVIEWWe are seeking an experienced Data Platform Architect to lead design and implementation of an enterprise grade, AI-ready data platform — spanning cloud storage, data engineering pipelines, governance-ready data products, and GenAI / RAG workloads. The ideal candidate brings deep expertise with AWS-native services, Databricks, and modern data stack tools, with a strategic mindset to architect platforms delivering governed, AI consumable datasets at scale.KEY RESPONSIBILITIES1. Storage & Platform Foundation Architect the AWS Data Platform: Amazon S3, Athena, Redshift, and AWS Lake Formation for governed accessDesign and operate Databricks Lakehouse (Delta Lake, Unity Catalog, Databricks Compute)Ensure secure, scalable, and cost-optimized foundational infrastructure aligned to enterprise SLAs 2. Processing & Engineering LayerLead data engineering pipelines using dbt (transformations & modeling) and Databricks (ML Runtime, Notebooks)Implement Data Quality & Testing using SODA; design Workflow Orchestration with monitoring and alertingDrive a shift-left engineering culture with reusable, well-documented pipelines.3. Data Products Layer — AI-Ready DatasetsDefine and build domain-owned, AI-ready Data Products (Customer 360, Patient 360, Clinical, Commercial, Feature-Ready, and RAG-Ready datasets)Champion a Data Mesh / Data Product mindset with clear ownership, SLAs, and discoverability 4. Governance & Catalog LayerImplement unified data governance using Atlan (Governance & Catalog) for discoverability, lineage, trust, and controlEstablish end-to-end Data Lineage tracking (data to production) using classification taggingDefine and enforce Policies & Access controls: RBAC/ABAC, PII/PHI tagging, sensitive data classification, and policy enforcementDrive AI Governance practices: AI data usage policies, model lineage, bias detection, and audit readinessEnsure Quality & Trust standards: data quality SLAs, certification workflows, and DQ dashboards5. AI & Advanced Analytics EnablementArchitect data infrastructure supporting ML Models (predictive, prescriptive, optimization) and GenAI/RAG pipelinesDesign pipelines for LLM apps, copilots, and knowledge search; support real-time Decision SystemsCollaborate with domain teams to build Domain-Owned AI & ML models on governed, trusted data6. Cross-Cutting CapabilitiesSecurity & Compliance: End-to-end data security, IAM, encryption, and audit-ready regulatory compliance (HIPAA, GDPR, SOC 2)Monitoring & Observability: Platform-level data and model observability, pipeline health dashboardsCollaboration: Tools and practices to connect people, knowledge, and data across the organization REQUIRED SKILLS & QUALIFICATIONSCloud & Storage 10+ years of experience in data architecture and cloud platformsDeep expertise with AWS services: S3, Athena, Redshift, Lake Formation, Glue, IAMProficiency in Databricks: Delta Lake, Unity Catalog, Databricks Workflows, MLflowExperience with data lakehouse architecture patterns and medallion architecture (Bronze / Silver / Gold)Data Engineering & Transformation Strong hands-on expertise with dbt (data build tool) for SQL-based transformations and data modelingPipeline orchestration experience: Apache Airflow, Databricks Workflows, or equivalentData quality tooling: SODA, Great Expectations, or Monte CarloProficiency in Python, SQL, and SparkData Governance & Catalog Experience implementing data catalogs and governance tools (Atlan, Collibra, Alation, or equivalent)Knowledge of data lineage, metadata management, classification frameworksUnderstanding of RBAC/ABAC, PII/PHI regulations, and policy enforcement at the platform levelAI & GenAI Readiness Experience building Feature Stores and ML-ready datasets for model training and inference.Familiarity with RAG (Retrieval-Augmented Generation) architecture: chunking, embedding generation, vector storesUnderstanding of LLMOps, model observability, and AI governance frameworksExperience with vector databases (Pinecone, Weaviate, pgvector, or Databricks Vector Search)Architecture & Leadership Proven ability to design and document enterprise data architecture using frameworks like TOGAF or ZachmanStrong stakeholder engagement skills — ability to translate business needs into technical architectureExperience working in regulated industries (Life Sciences, Financial Services, Healthcare) is a strong plusAWS Certified Solutions Architect (Professional) or Databricks Certified Professional preferredNICE TO HAVEExperience with ThoughtSpot, Tableau, or Power BI for self-service BI layer integrationFamiliarity with SAS Viya or clinical analytics platformsKnowledge of FHIR, HL7, or CDISC data standards for Life SciencesContributions to open-source data engineering projectsExperience with multi-cloud or hybrid-cloud architecturesWHAT WE OFFEROpportunity to architect one of the most comprehensive AI-ready data platforms in the industryExposure to Fortune 500 clients across Life Sciences, Financial Services, and TechnologyAWS Generative AI Competency partner — work on cutting-edge GenAI and LLM use casesCollaborative, innovation-first culture with clear career growth pathCompetitive compensation, flexible work arrangements (remote/hybrid available)Access to certifications, training, and industry conferencesComprehensive Benefits: Medical, Dental, and Vision insurance coverage for employees and eligible dependents.Retirement Benefits: 401(k) retirement savings plan with company benefits as per policy.