PySpark / Java Developer
ARCHIVED
We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.
Job DescriptionRoles and ResponsibilitiesDesign, develop, and maintain scalable ETL pipelines for large-scale structured and unstructured data.Build and optimize data processing applications using PySpark and Java.Work extensively with relational databases and big data platforms for data extraction, transformation, and loading.Analyze and resolve performance bottlenecks in high-volume SQL procedures and big data processing jobs.Develop efficient data movement and transformation workflows across distributed systems.Collaborate with cross-functional teams to understand end-to-end data flow and business requirements.Support production systems, troubleshoot issues, and ensure data pipeline reliability.Required Skills & Experience5+ years of experience in Microsoft SQL Server and relational database development for data extraction applications.Strong understanding of ETL concepts, database technologies, and large-scale data processing.Proven experience in performance tuning of SQL queries and understanding of different indexing strategies.2+ years of experience working with big data technologies including:HadoopSpark / PySparkHiveImpalaPython2+ years of hands-on experience with the Cloudera Hadoop Ecosystem, including:HDFSHiveImpalaSparkKafkaHueOozieYARNSqoopExperience in processing large volumes of structured and unstructured data using Spark.Strong understanding of end-to-end (E2E) data pipeline architecture and application workflows.Preferred SkillsDomain experience in healthcare claims data or healthcare analytics.Experience with distributed data processing and optimization in production environments.Strong troubleshooting and analytical skills in complex data ecosystems.