Senior Data Engineer (Python/PySpark/AWS)
Candidates must be local to the Washington D.C. metro areaAbout Infinitive:Infinitive is a data and AI consultancy that enables its clients to modernize, monetize and operationalize their data to create lasting and substantial value. . We possess deep industry and technology expertise to drive and sustain adoption of new capabilities. We match our people and personalities to our clients' culture while bringing the right mix of talent and skills to enable high return on investment.Infinitive has been named “Best Small Firms to Work For” by Consulting Magazine 7 times most recently in 2024. Infinitive has also been named a Washington Post “Top Workplace”, Washington Business Journal “Best Places to Work”, and Virginia Business “Best Places to Work.”We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will play a crucial role in designing, developing, and maintaining our clients data infrastructure. Your expertise in Python, PySpark, ETL processes, CI/CD (Jenkins or GitHub), and experience with both streaming and batch workflows will be essential in ensuring the efficient flow and processing of data to support our clients.Responsibilities:Data Architecture and Design:Collaborate with cross-functional teams to understand data requirements and design robust data architecture solutionsDevelop data models and schema designs to optimize data storage and retrievalETL Development:Implement ETL processes to extract, transform, and load data from various sourcesEnsure data quality, integrity, and consistency throughout the ETL pipelinePython and PySpark Development:Utilize your expertise in Python and PySpark to develop efficient data processing and analysis scriptsOptimize code for performance and scalability, keeping up-to-date with the latest industry best practicesData Integration:Integrate data from different systems and sources to provide a unified view for analytical purposesCollaborate with data scientists and analysts to implement solutions that meet their data integration needsStreaming and Batch Workflows:Design and implement streaming workflows using PySpark Streaming or other relevant technologiesDevelop batch processing workflows for large-scale data processing and analysisCI/CD Implementation:Implement and maintain continuous integration and continuous deployment (CI/CD) pipelines using Jenkins or GitHub ActionsAutomate testing, code deployment, and monitoring processes to ensure the reliability of data pipelinesQualifications:Bachelor's or Master's degree in Computer Science, Information Technology, or a related field7+ years of proven experience as a Data Engineer or similar roleStrong programming skills in Python and expertise in PySpark for both batch and streaming data processingHands-on experience with ETL tools and processesFamiliarity with CI/CD tools such as Jenkins or GitHub ActionsSolid understanding of data modeling, database design, and data warehousing conceptsExcellent problem-solving and analytical skillsStrong communication and collaboration skillsPreferred Skills:Knowledge of cloud platforms such as AWS, Azure, or Google CloudExperience with version control systems (e.g., Git)Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)Understanding of data security and privacy best practicesInfinitive is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law.