ML Ops Engineer
Job Title : ML Ops EngineerLocation : RemoteDuration : 6+ months Contract (with possible extension)Job Description:Building the Azure AI Instance:This is the foundational infrastructure work that everything else depends on. The ML Ops Engineer is responsible for designing, provisioning, and maintaining the Azure environment that hosts all AI/ML workloads.Environment Setup:Provision Azure resource groups, networking (VNets, private endpoints), and identity management (Entra ID, RBAC). Configure Azure Machine Learning workspaces with appropriate compute targets (CPU/GPU clusters, serverless endpoints).MLOps Pipeline Design:Build end-to-end ML pipelines using Azure ML Pipelines or Fabric Data Factory. Implement model training, evaluation, registration, and deployment workflows with full versioning and reproducibility.Security & Compliance:Implement data encryption at rest and in transit, managed identities, key vault integration, and network isolation. Ensure alignment with Novolex IT security policies and the AI Governance Framework.Cost Management:Monitor Azure consumption via Cost Management + Billing. Set budgets, alerts, and implement auto-scaling policies to optimize spend against the approved AI CoE budget.Monitoring & Alerting:Configure Azure Monitor, Application Insights, and Log Analytics for infrastructure health, model drift detection, and pipeline failure alerting. Set up dashboards in Power BI or Azure Workbooks.Data Integration & Analytics:Client Fabric serves as the unified analytics platform connecting Novolex's data estate to AI workloads. The ML Ops Engineer bridges the gap between raw enterprise data and production-ready ML features.Lakehouse Architecture:Design and build Fabric Lakehouses using medallion architecture (Bronze/Silver/Gold layers). Ingest data from SAP, Snowflake, Azure SQL, and flat file sources via Data Factory pipelines and Shortcuts.Semantic Models:Create Power BI semantic models on top of Gold-layer tables to enable self-service analytics for business users while ensuring AI/ML pipelines consume the same curated datasets.Notebooks & Spark:Develop PySpark and Python notebooks in Fabric for feature engineering, data exploration, and ad-hoc analysis. Leverage Fabric's built-in MLflow integration for experiment tracking.Data Quality & Lineage:Implement data validation rules, freshness checks, and automated lineage tracking via Client Purview integration. Flag and remediate data quality issues before they impact model performance.Real-Time Capabilities:Evaluate and implement Fabric Real-Time Analytics (KQL databases, Event streams) for use cases requiring near-real-time data ingestion from manufacturing systems and IoT sensors.