JOBSEARCHER
<Back to Search

Site Reliability Engineer II

Title: Site Reliability Engineer II Location: Alpharetta, GA (3 days a week onsite) Duration: 6 months Job Description: We are seeking a skilled Site Reliability Engineer to join our team and help build, maintain, and scale our cloud-native infrastructure. You will work closely with development and operations teams to ensure our systems are reliable, scalable, and efficient. The ideal candidate is passionate about automation, observability, and infrastructure-as-code, and thrives in a collaborative, fast-paced environment. Key Responsibilities * Design, implement, and manage cloud infrastructure on Azure using Terraform and Terragrunt. * Maintain and optimize Kubernetes clusters on Azure Kubernetes Service (AKS). * Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD for GitOps deployments. * Enhance system reliability by implementing monitoring, alerting, and observability solutions with Grafana. * Automate operational tasks to reduce toil and improve team efficiency. * Participate in on-call rotations, incident response, and post-mortem analysis. * Collaborate with development teams to improve application performance, scalability, and resilience. * Implement and advocate for SRE best practices, including SLIs, SLOs, and error budgets. * Continuously improve system performance, cost efficiency, and security. Required Skills & Qualifications * 3+ years of experience in an SRE, DevOps, or cloud infrastructure role. * Strong experience with Azure cloud services and infrastructure. * Hands-on experience with java and Terraform and Terragrunt for infrastructure-as-code. * Proficiency with Kubernetes (preferably AKS and container orchestration. * Experience with CI/CD tools, especially GitHub Workflows/Actions and ArgoCD. * Solid understanding of observability tools like Grafana (Prometheus, Loki, Tempo experience is a plus). Education Requirements Bachelor's degree required, (Masters preferred)

Showing 50 of 42,855 matching similar jobs