Spark job migration specialist
Spark Job Migration Specialist Location: SFO, CA - RemoteDuration: Long-Term ContractA Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x.Key Job Responsibilities Migrate JVM workloads and Spark-Submit tasks to Databricks JAR tasks or Notebook tasks.Convert existing HiveQL scripts and Oozie workflows into optimized Spark SQL or PySpark applications.Adapt data pipelines from Azure Synapse to any cloud platform, including updating library dependencies and notebook references.Implement Adaptive Query Execution (AQE) in Spark 3 to improve shuffle performance and fix skew joins.Perform regression testing to ensure output consistency between old and new systems using validation scripts.Use spark.sparkContext.setJobDescription() to label, monitor, and troubleshoot specific Spark tasks in the UI.Job Description/Profile Role: Big Data Migration Engineer (Spark)Experience: 5+ years experience with Apache Spark (PySpark/Scala) and Cloud platforms (Azure/AWS).Requirements:Strong experience with HDFS, Hadoop ecosystem (Hive, Spark, HBase, MapReduce).Experience in data migration to cloud / enterprise data platforms.Knowledge of: Data ingestion tools (Sqoop, Kafka, NiFi, etc.) Cloud storage (ADLS, S3, Blob Storage) Distributed processing frameworks SQL and performance tuning expertise. Experience in scripting (Python, Shell, Scala).Key Migration Focus AreasData Pipelines: Ensuring schema evolution, data correctness, and testing with golden datasets.Job Definitions: Reconfiguring job properties, cluster settings, and Spark configurations.