JOBSEARCHER

Cloud Infrastructure Management/Lead/Architect: 12+ Years

Job Title: Cloud Infrastructure ManagementLocation: Bloomfield, CT (Hybrid)Must go to office for 3 daysKey ResponsibilitiesDesign, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and AzureManage compute resources such as EC2, Azure Virtual Machines, GPU instances, EKS, VPC, ECS, S3, Lambda, Route 53 and Kubernetes clustersProvision and configure storage, networking, and security services for AI platforms.Ensure high availability, scalability, and reliability of AI environments.AI Platform SupportDeploy and maintain AI/ML services such as:Amazon SageMaker and Azure Microsoft FoundryAzure Machine LearningAI model training and inference environmentsSupport data scientists and ML engineers by providing optimized infrastructure for model training and deploymentAutomation & Infrastructure as CodeImplement Infrastructure as Code (IaC) using tools such as:TerraformCloudFormationARM templates / BicepDocker FilesAutomate and set up environment provisioning, patching, and scaling.Containerization & OrchestrationDeploy and manage containerized AI workloads using:DockerKubernetesAmazon EKSAzure Kubernetes Service (AKS)ECSMonitoring & Performance OptimizationMonitor system health, performance, and resource utilization using tools like:CloudWatchAzure MonitorDatadog / PrometheusOptimize infrastructure for cost, performance, and GPU utilizationSecurity & ComplianceImplement cloud security best practices including:IAM / RBAC managementNetwork security groupsEncryption and secrets managementEnsure compliance with organizational and regulatory standards.CI/CD & DevOps IntegrationIntegrate AI infrastructure with CI/CD pipelinesSupport automated deployment of models and AI services.Required Qualifications· Bachelor's degree in Computer Science, Information Systems, or related field5+ years experience in infrastructure administration or cloud engineering.· Strong hands-on experience with:AWS cloud servicesMicrosoft Azure cloud services· Experience supporting AI/ML infrastructure or data platforms· Proficiency with Linux administration and scripting (Python, Bash, PowerShell, Terraform, terra grunt, )· Experience with Docker and Kubernetes· Experience with GitHub Actions· Experience with LLM infrastructure set upExperience with working in centralized team with triaging capabilitiesThanks & RegardsShyam (SAM)Sr. RecruiterEmail: sam.s@navasoftware.com