Data Warehouse Lead (Hybrid - NEW JERSEY)
The Data Warehouse Lead is the primary architect and technical authority for our cloud data infrastructure. We are seeking a high-seniority engineer with 10+ years of experience to design, implement, and scale complex data ecosystems. This role is pivotal in bridging the gap between business strategy and technical execution, ensuring a resilient, cost-optimized, and high-performance data platform across GCP and AWS.Beyond technical architecture, you will be the heart of our operational excellence—managing resources, mentoring a senior engineering team, and ensuring our operations meet the highest standards of safety, quality, and efficiency.II. Key ResponsibilitiesInfrastructure Strategy: Define the architectural roadmap for Data Lakehouse and Data Mesh initiatives using native cloud services.Engineering Leadership: Set coding standards, perform peer code reviews, and mentor a team of senior data engineers in a CI/CD environment.FinOps & Resource Management: Monitor and optimize cloud spend (BigQuery slot management, Redshift concurrency scaling, and S3/GCS lifecycle policies).Stakeholder Alignment: Translate complex business requirements into scalable technical designs for financial, pharmaceutical, or logistics analytics (NJ core sectors).Multi-Cloud Integration: Design secure, low-latency interoperability and data movement between GCP and AWS environments.III. Engineering, Modeling & Deep Tech Stack1. Cloud Data Architecture (GCP & AWS)GCP Ecosystem: Expert-level management of BigQuery (partitioning, clustering, and reservation models), Dataflow (Apache Beam), and Dataproc.AWS Ecosystem: Advanced configuration of Amazon Redshift (RA3 architecture), AWS Glue (Catalog & ETL), and S3 as a primary Data Lake provider.Multi-Cloud Integration: Designing secure, low-latency interoperability and data movement between GCP and AWS environments.2. Advanced Modeling & Storage FrameworksArchitectural Patterns: Mastery of Data Vault 2.0 for auditable raw layers and Star Schema (Kimball) for optimized presentation layers.Open Table Formats: Implementation of Apache Iceberg, Hudi, or Delta Lake to provide ACID transactions and schema evolution on top of object storage.Change Data Capture (CDC): Deployment of AWS DMS or GCP Datastream for real-time replication from legacy RDBMS (SQL Server, Oracle, Postgres).3. Pipelines, Orchestration & DataOpsDevelopment: Expert-level Python and advanced Analytical SQL (window functions, recursive CTEs).Transformation & Orchestration: Mandate use of dbt (data build tool) for modular modeling and Apache Airflow (MWAA/Cloud Composer) for complex DAG orchestration.CI/CD & IaC: Provisioning and versioning all data infrastructure using Terraform or AWS CDK.4. Streaming & ObservabilityEvent-Driven Design: Real-time ingestion using Apache Kafka, AWS Kinesis, or GCP Pub/Sub.Data Reliability: Implementing automated data quality frameworks (e.g., Great Expectations) and end-to-end lineage (DataHub or Monte Carlo).Security & Governance: Deep knowledge of IAM (RBAC/ABAC), KMS encryption, VPC Peering, and PII masking/tokenization.IV. Experience & QualificationsExperience: 10+ years in Data Engineering, with at least 5 years specifically architecting multi-cloud solutions (GCP/AWS).Education: B.S. or M.S. in Computer Science, Software Engineering, or a related quantitative field.Location: Must be based in or willing to commute to New Jersey (Jersey City, Newark, or Princeton area) on a hybrid schedule.Certifications: * Google Professional Data EngineerAWS Certified Data Engineer – Specialty