Senior DevOps Engineer - AI Healthcare Leader
Staff DevOps Engineer — Cloud Infrastructure, Kubernetes & AI Platform OperationsThis opportunity is with a client of Andiamo, an innovative healthcare technology organization building AI-driven digital platforms that support patients, providers, and enterprise healthcare systems at scale.About The OpportunityWe are seeking a highly experienced Staff DevOps Engineer to help lead the evolution of a modern cloud infrastructure environment powering mission-critical healthcare and AI applications. This is a senior-level engineering role designed for someone who thrives in complex distributed systems, enjoys solving large-scale operational challenges, and wants meaningful ownership over platform reliability, scalability, and infrastructure strategy.You will play a central role in designing and operating cloud-native infrastructure across both internal platforms and enterprise partner environments. The ideal candidate combines deep Kubernetes expertise with strong cloud engineering capabilities, infrastructure-as-code experience, and a passion for building resilient, secure, and highly automated systems.This role also offers the opportunity to work at the intersection of DevOps, AI infrastructure, platform reliability, and healthcare technology in a highly collaborative and fast-moving environment.What You’ll Be Responsible ForCloud Infrastructure & Platform EngineeringLead the design, implementation, and ongoing optimization of Kubernetes-based infrastructure environments supporting large-scale production applications and enterprise integrations.Architect and maintain cloud-native systems across multi-cloud environments, ensuring scalability, reliability, security, and operational efficiency.Develop and enhance reusable infrastructure-as-code modules using Terraform across cloud providers and supporting services.Drive improvements to deployment pipelines, automation frameworks, and platform tooling that enable engineering teams to ship software efficiently and safely.CI/CD, Automation & Developer EnablementDesign and maintain enterprise-grade CI/CD workflows and reusable pipeline frameworks that support secure and scalable software delivery.Support GitOps-based deployment strategies and operational workflows across engineering teams.Own and maintain critical infrastructure services running within Kubernetes environments, including deployment automation, ingress systems, observability tooling, and operational support platforms.Continuously improve developer productivity, deployment reliability, and operational visibility through automation and platform enhancements.Security, Compliance & ReliabilityImplement and support infrastructure security controls, secrets management strategies, container security scanning, and software supply chain protections.Partner with internal teams to support compliance initiatives aligned to regulated environments including healthcare and security-focused operational standards.Lead disaster recovery readiness initiatives including failover testing, operational runbooks, resiliency planning, and recovery validation exercises.Monitor, troubleshoot, and improve production reliability while participating in operational incident response and daytime on-call rotations.AI Infrastructure & Operational InnovationContribute to the development of next-generation AI-powered operational tooling and intelligent infrastructure automation.Help evaluate and implement emerging technologies that improve observability, operational scalability, and platform intelligence.Support environments involving AI workloads, high-performance infrastructure, and advanced cloud orchestration patterns.Leadership & Cross-Functional CollaborationMentor engineers across the DevOps and infrastructure organization while helping establish operational standards and engineering best practices.Partner closely with software engineering, security, product, and platform teams to drive infrastructure initiatives and long-term technical strategy.Provide technical leadership on complex platform projects spanning cloud architecture, reliability engineering, automation, and enterprise integrations.What You BringRequired Qualifications5+ years of experience in DevOps, Platform Engineering, or Site Reliability EngineeringDeep expertise with Kubernetes and cloud-native operational toolingStrong hands-on experience with Helm, ArgoCD, Helmfile, cert-manager, Kyverno, NGINX Ingress, and related Kubernetes ecosystem technologiesExtensive experience designing and operating infrastructure on Google Cloud Platform including GKE, IAM, Cloud SQL, storage services, and identity managementAdvanced Terraform experience including modular infrastructure design, multi-environment deployments, and infrastructure testing practicesStrong experience with GitLab CI/CD pipelines, GitOps methodologies, and deployment automationProgramming proficiency in Python and/or GoExperience supporting infrastructure security, secrets management, and compliance-focused operational environmentsStrong troubleshooting, monitoring, and production operations experienceAbility to lead complex infrastructure initiatives across multiple engineering teamsPreferred QualificationsAdvanced scripting experience using Bash or similar toolingExperience with Vault, Akeyless, or enterprise secrets management platformsOperational experience with PostgreSQL, Redis, or MongoDB administration and disaster recovery planningExperience with observability and monitoring platforms such as DatadogHands-on experience managing Cloudflare services including DNS, CDN, and security policiesExperience designing and executing disaster recovery and failover testing programsBackground working in highly regulated environments including HIPAA or SOC2Experience with GPU clusters, HPC infrastructure, or AI-focused operational environmentsFamiliarity with AI agents, intelligent automation tooling, or agentic infrastructure systemsExperience with AWS and hybrid cloud environmentsStrong communication and mentorship skillsWhy This Role Is UniqueThis position offers the opportunity to work on highly scalable cloud infrastructure supporting AI-powered healthcare systems with real-world impact.You’ll help shape the operational foundation of modern healthcare technology platforms while working on challenging problems involving Kubernetes, cloud reliability, security, automation, AI infrastructure, and enterprise-scale DevOps practices.You’ll also have significant ownership, direct technical influence, and the ability to help define the next generation of platform engineering standards inside a rapidly evolving technology environment.Work Environment & BenefitsHybrid work model based in New York City with collaborative in-office engagementCompetitive compensation package including salary, equity, and comprehensive healthcare benefits401(k) program and commuter benefitsPaid parental leaveGenerous PTO, company holidays, sick time, and personal daysCollaborative team culture with regular company events and social programmingOpportunities for technical growth, mentorship, and long-term career advancementIf you are passionate about cloud infrastructure, operational excellence, Kubernetes ecosystems, and building resilient systems that power meaningful healthcare innovation, this role offers the chance to make a significant technical and organizational impact.About AndiamoTalent Partners for the AI Revolution. As a globally recognized staffing and consulting firm, we specialize in placing the top 2% of technology and go-to-market professionals with the world’s largest and most well-known companies.For over 20 years, we've maintained the status of tier-one vendor for firms such as Palantir, Amazon, Fluidstack, Bloomberg, Relativity Space, Firefly, MasterCard, Visa, Two Sigma, Citadel, as well as other major financial services firms, elite hedge funds, Google-backed tech start-ups, and major software firms.Our talent solutions include Permanent Placement, Contract Staffing, Executive Search, and Dedicated Recruiting Services (RPO). Find out more at www.andiamogo.com