AI DevOps Engineer
About the roleSeeking a highly skilled AI DevOps Engineer to design and manage scalable, secure infrastructure for AI and LLM-powered applications within a regulated financial services environment. This role combines DevOps, Platform Engineering, and Site Reliability Engineering to support high-performance AI systems and workflows.Key responsibilities- Design, deploy, and maintain scalable infrastructure for production AI and LLM applications, ensuring high availability and security.- Develop and manage Infrastructure-as-Code using Terraform to enable secure and repeatable deployments.- Implement and operate Kubernetes environments to support containerized AI workloads at scale.- Establish monitoring, alerting, and incident response procedures to maintain system reliability and performance.- Collaborate with security and compliance teams to uphold regulatory standards and improve automation processes for AI infrastructure.Required skills and experience- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.- Proven experience in DevOps, Platform Engineering, or Site Reliability Engineering roles managing large-scale production infrastructure.- Strong expertise with Terraform and Infrastructure-as-Code methodologies.- Hands-on experience deploying and operating Kubernetes clusters in production environments.- Experience supporting AI platforms or LLM-based workloads with focus on automation, scalability, and cloud-native architectures.Nice to have- Experience supporting production-grade LLM applications and AI agent workflows.- Familiarity with vector databases such as Pinecone, Weaviate, or PostgreSQL with pgvector.- Exposure to AI developer tooling and internal platform support.- Understanding of observability, monitoring, and capacity planning for AI/ML systems.- Experience within financial services or other regulated industries and strong cross-functional communication skills.LocationNew York, NY, United States