Google Cloud Data Architect & IAM Data Modernization
Role: Google Cloud Data Architect – IAM Data ModernizationLocation: Dallas, TX / Charlotte, NC/ Iselin, NJ, / Chandler, AZ / Ohio, Delaware (Hybrid) Must be a US Citizen/ GC only About Position: Identity & Access Management (IAM) Data Modernization – migration of an on‑premises SQL data warehouse to a target‑state Data Lake on Google Cloud (GCP), enabling metrics & reporting, advanced analytics, and GenAI use cases (natural language querying, accelerated summarization, cross‑domain trend analysis) leveraging PySpark‑based processing, cloud‑native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high‑performance data solutions.What You'll Do:DevOps / CI‑CDExperience implementing CI/CD pipelines for data and analytics workloadsFamiliarity with Git‑based source control, build automation, and deployment strategiesContainers & PlatformExperience with OpenShift Container Platform (OCP) for deploying data workloads and servicesUnderstanding of containerized architecture, scaling, and environment managementProven ability to build CI/CD pipelines for data and infrastructure workloadsExperience managing secrets securely using GCP Secret ManagerOwnership of observability, SLOs, dashboards, alerts, and runbooks Proficiency in logging, monitoring, and alerting for data pipelines and platform reliabilityBig Data & ProcessingHands‑on experience with PySpark for ETL/ELT, data transformation, and performance optimizationSolid understanding of distributed data processing conceptsData & Cloud ArchitectureStrong experience designing data platforms on Google Cloud Platform (GCP)Experience with Data Lakes, data warehousing, and large‑scale migration programsData Lake Architecture & StorageProven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models).Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controlsData Ingestion & Orchestration Experience with Hadoop/HDFS architecture, distributed file systems, and data locality principlesHands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniquesExpertise in partitioning strategies, backfills, and large-scale data organizationAbility to design data models optimized for analytics and BI consumption Experience building batch and streaming ingestion pipelines using GCP-native services Knowledge of Pub/Sub-based streaming architectures, event schema design, and versioning Strong understanding of incremental ingestion and CDC patterns, including idempotency and deduplication Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow) Ability to design robust error handling, replay, and backfill mechanismsData Processing & Transformation Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc) Strong proficiency in BigQuery SQL, including query optimization, partitioning, clustering, and cost control. Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive, Pig, Sqoop) Advanced Python programming skills for data engineering, including testing and maintainable code design Experience managing schema evolution while minimizing downstream impactAnalytics & Data Serving Expertise in BigQuery performance optimization and data serving patterns Experience building semantic layers and governed metrics for consistent analytics Familiarity with BI integration, access controls, and dashboard standards Understanding of data exposure patterns via views, APIs, or curated datasetsData Governance, Quality & Metadata Experience implementing data catalogs, metadata management, and ownership models Understanding of data lineage for auditability and troubleshooting Strong focus on data quality frameworks, including validation, freshness checks, and alerting Experience defining and enforcing data contracts, schemas, and SLAsGood to haveSecurity, Privacy & Compliance Hands-on experience implementing fine-grained access controls for BigQuery and GCS Experience with Sprint planning and helping team technically. Strong stakeholder communication and solution‑architecture skillsExpertise You'll Bring:Experience: [10–14]+ years in DevOps and Data Architecture, 5+ years designing on Pyspark/GCP/OCP at scale; prior on‑prem → cloud migration a must. Education: Bachelor’s/Master’s in Computer Science, Information Systems, or equivalent experience. Certifications:Google Cloud Professional Cloud Architect/DevOps/OCP (required or within 3 months). Plus: Professional Data Engineer, Security EngineerFlexible work from home options available.