Site Reliability Engineer
Role : Sr SRE Location: Atlanta, GA (Onsite, need local)Type: Contract SkillsMandatory: CloudWatch, Dynatrace, Git, ObservabilityPreferred: Chaos TestingJDQualificationsExtensive experience supporting AWS production systems (EC2, VPC, ALB/NLB, RDS, Lambda, EKS)Skilled in incident management and 24x7 production supportProficient with monitoring tools: CloudWatch, Dynatrace, Quantum MetricsStrong troubleshooting across infrastructure, networking, and application layersFamiliar with CI/CD pipelines and AWS deployment processesResponsibilitiesProvide L1/L2 support for AWS applications and infrastructure incidentsTriage and resolve issues, restore services within SLAs, and escalate code defects with clear diagnosticsParticipate in on-call rotations, major incident bridges, and post-incident reviewsAnalyze defects, configuration issues, and anomalies from monitoring tools or user reportsPerform regular health checks on AWS servicesMonitor system health using CloudWatch, Dynatrace, Quantum Metrics, and ThousandEyesRespond proactively to issues with resource usage, latency, errors, and availabilityMaintain and enhance dashboards for observability