JOBSEARCHER

Infrastructure admin for AI services (Azure & AWS)

Job title: Infrastructure admin for AI services (Azure & AWS)location: Remote$50/hrKey ResponsibilitiesDesign, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and AzureManage compute resources such as EC2, Azure Virtual Machines, GPU instances, EKS, VPC, ECS, S3, Lambda, Route 53, and Kubernetes clustersProvision and configure storage, networking, and security services for AI platformsEnsure high availability, scalability, and reliability of AI environmentsDeploy and maintain AI/ML services such as Amazon SageMaker, Azure Microsoft Foundry, and Azure Machine LearningSupport data scientists and ML engineers by providing optimized infrastructure for model training and deploymentImplement Infrastructure as Code (IaC) using Terraform, CloudFormation, ARM templates / Bicep, and Docker FilesAutomate and set up environment provisioning, patching, and scalingDeploy and manage containerized AI workloads using Docker, Kubernetes, Amazon EKS, Azure Kubernetes Service (AKS), and ECSMonitor system health, performance, and resource utilization using CloudWatch, Azure Monitor, Datadog / PrometheusOptimize infrastructure for cost, performance, and GPU utilizationImplement cloud security best practices including IAM / RBAC management, network security groups, encryption, and secrets managementEnsure compliance with organizational and regulatory standardsIntegrate AI infrastructure with CI/CD pipelinesSupport automated deployment of models and AI servicesRequired QualificationsBachelor s degree in Computer Science, Information Systems, or related field5+ years experience in infrastructure administration or cloud engineeringStrong hands-on experience with AWS cloud services and Microsoft Azure cloud servicesExperience supporting AI/ML infrastructure or data platformsProficiency with Linux administration and scripting (Python, Bash, PowerShell, Terraform, Terragrunt)Experience with Docker and KubernetesExperience with GitHub ActionsExperience with LLM infrastructure set upExperience working in a centralized team with triaging capabilities