Kubernetes Engineer
Description:We are seeking a Kubernetes Engineer to help manage, support, and improve 6+ on-premises Kubernetes clusters across multiple global sites. This role is responsible for maintaining platform reliability, enhancing observability, driving infrastructure and operational improvements, and supporting cluster lifecycle management across a distributed on-prem environment. The engineer will work within a platform ecosystem that includes Charmed Kubernetes, Juju, MAAS, ArgoCD, Harbor, Prometheus, Grafana, and Ceph to ensure the environment remains stable, scalable, and efficient.This role requires strong hands-on experience in production Kubernetes operations, with a focus on on-prem infrastructure, multi-site reliability, and continuous platform improvement. The engineer will partner with infrastructure, networking, storage, and application teams to troubleshoot issues, improve platform standards, and support the growth and maturity of Kubernetes services across all supported locations.Key Responsibilities:Manage and support 6+ on-prem Kubernetes clusters across multiple sitesMaintain platform reliability, availability, and operational consistency across all environmentsMonitor and troubleshoot issues involving cluster health, workloads, nodes, networking, ingress, storage, and supporting platform servicesSupport cluster lifecycle activities including provisioning, upgrades, patching, scaling, and general maintenanceImprove observability, monitoring, and alerting using tools such as Prometheus and GrafanaSupport GitOps and deployment workflows using ArgoCDHelp manage and support container image workflows and registry integrations through HarborWork with MAAS, Juju, and Charmed Kubernetes to support cluster and infrastructure lifecycle management in an on-prem environmentSupport and improve persistent storage capabilities leveraging Ceph and related storage integrationsDrive platform enhancements that improve resiliency, scalability, security, automation, and supportability across all sitesStandardize operational processes, cluster configurations, and support models where possibleParticipate in incident response, root cause analysis, and long-term reliability improvementsCreate and maintain documentation for architecture, operational procedures, and best practicesRequired / Preferred Experience:Strong experience administering Kubernetes in production environmentsExperience supporting multiple Kubernetes clusters across distributed or multi-site environmentsStrong Linux systems administration and infrastructure troubleshooting skillsExperience with Charmed Kubernetes, Juju, and MAAS in an on-prem environmentExperience with ArgoCD or similar GitOps deployment toolsExperience with Prometheus and Grafana for monitoring, alerting, and platform observabilityExperience with Harbor or similar container registriesExperience with Ceph or similar distributed storage platforms supporting Kubernetes workloadsSolid understanding of Kubernetes networking, ingress, storage, and cluster operationsExperience improving reliability, automation, and operational maturity in production platformsStrong collaboration skills across infrastructure, platform, and application teams