SRE Architect
Position OverviewWe are seeking an experienced SRE Architect to design, implement, and optimize highly reliable, scalable, and resilient cloud infrastructure. The ideal candidate will bring deep expertise in AWS, Kubernetes, and Terraform, along with a strong foundation in automation, observability, and DevOps practices.Key ResponsibilitiesArchitect and implement scalable, fault-tolerant systems on AWS.Design and manage containerized environments using Kubernetes.Develop and maintain Infrastructure as Code (IaC) using Terraform.Establish and enforce SRE best practices including SLIs, SLOs, and error budgets.Build and enhance CI/CD pipelines for reliable and automated deployments.Implement monitoring, logging, and alerting solutions to ensure system health and performance.Drive incident management, root cause analysis (RCA), and postmortem processes.Collaborate with development, DevOps, and security teams to improve system reliability.Optimize cloud cost, performance, and scalability.Required Skills & Qualifications8+ years of experience in Site Reliability Engineering / DevOps / Cloud Architecture.Strong hands-on experience with AWS (EC2, EKS, S3, VPC, IAM, RDS, etc.).Deep expertise in Kubernetes (cluster management, scaling, networking).Proficiency in Terraform for infrastructure automation.Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, etc.).Strong knowledge of Linux systems and scripting (Bash, Python, or Go).Experience with monitoring tools like Prometheus, Grafana, CloudWatch, or Datadog.Solid understanding of networking, security, and distributed systems.