JOBSEARCHER

Data Engineer- Python, AI/ML (Troy)

Key ResponsibilitiesBuild and maintain Python and SQL pipelines for governance-related ingestion, cleaning, transformation, and validation of structured and semi-structured data.Implement and operate data quality checks, schema validation, and integrity rules across pipelines; investigate and resolve quality issues.Contribute to master data workflows: standardization, deduplication, and consolidation of data from heterogeneous sources into consistent reference and golden-record datasets.Instrument pipelines for data lineage, metadata, and catalog tooling.Develop pipelines that feed governance dashboards and reporting in Tableau, Power BI, or Looker.Build reproducible, well-documented pipelines for compliance and audit reporting.Contribute to AI / ML-assisted governance use cases: embedding-based data classification, anomaly detection on quality metrics, LLM-assisted catalog search, and MCP-based exposure of governed datasets to AI assistants.Partner with team leads, data stewards, and stakeholders to translate governance requirements into engineering work.Follow team engineering practices: Git, code review, modular pipeline design, automated testing, CI/CD.Required QualificationsBachelor's or Master's degree in Computer Science, Data Science, Engineering, Statistics, or a related field.2+ years building data pipelines in Python (Pandas, NumPy, SciPy) and SQL.Working experience with Apache Spark or PySpark and workflow orchestration (Apache Airflow).Schema design across relational (PostgreSQL, MySQL, SQL Server) and analytical databases, including standardization across heterogeneous sources.Experience implementing data quality validation, EDA, and integrity enforcement on production datasets.Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP).Working familiarity with Python ML libraries (Scikit-Learn) for feature engineering and exploratory analysis.Experience producing analytics-ready datasets for BI tools (Tableau, Power BI, or Looker).Git, code review, and CI/CD practices.Clear technical communication and collaborative working style.