Upvote
Downvote
Lead Software Engineer - Site Reliability Engineer
Share Job
- Suggest Revision
Full-time
- Responsible for developing SRE framework across the technology suite of applications and Infrastructure with full stack development skillsets (front-end, back-end and database),
- Introduce enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, CI/CD integration, continuous testing (performance, smoke, regression, functional, chaos) introduce continuous improvement, standardization/automation, capabilities to conduct destructive and resiliency testing
- Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems, and evolve ChatOps to reduce time to recovery.
- Share support responsibilities for critical applications and customer journeys onboarded to SRE including remediation of issues through Agile, conduct blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats.
- OS and Platform - AWS, Lamda, PCF, Kubernetes, OpenShift, Linux, Azure, Windows, Vmware
Active Job
Updated TodaySimilar Job
Relevance
Active