Major Incident Manager (SRE)
Must be willing to work onsite from DAY 1 from client office in Phila, PAAbout the RoleRequired skillsets include deep expertise in incident command, SRE and operations engineering, reliability architecture, automation and observability, executive communication, and the ability to lead cross-functional teams through high-impact outages, systemic problem resolution, and large-scale change events.Required SkillsLead major incident management for high-severity outages, driving resolution across cross-functional teams and ensuring minimal business impact.Apply expertise in SRE, operations engineering, and reliability architecture to improve platform stability, governance, and operational standards.Work on automation & observability, including monitoring, alerting, and auto-remediation to enhance incident response efficiency.Hands-on with tools like ELK (Elastic Stack), Grafana, AppDynamics, and COP, including scripting, auto-executing queries, and attaching logs/metrics/traces to incidents.Enhance alert intelligence by analyzing metrics deviations, log patterns, anomaly trends, and clearly communicate updates to stakeholders and leadership