Site Reliability Engineer
Job DutiesPartner with the architecture and development teams on how to make applications highly available,reliable, and performant at a global scaleCollaborate with the architecture team to ensure Reliability factors are accounted for in businessfeatures and enablersGuide development teams in understanding established service level objectives and consequences,and implementing appropriate SLIs to support the objectives.Collaborate with development team members to swarm, troubleshoot, and resolve problems.Guide ad-hoc teams to brainstorm solutions and build implementation plans based on the Root CauseAnalysis of production issuesDesign and build automated solutions to optimize application/service/platform uptime with minimalhuman interventionBe available for an on-call rotation to participate in troubleshooting and communication effortsoutside of normal business hoursImplement and help create standards and best practices, and mentor other team members in orderto drive adoption across development teamsPerform other duties as assignedConform with all company policies and proceduresJOB SPECIFICATIONExpert in defining, implementing, and evaluating Service Level Objectives (SLO) and Service LevelIndicators (SLI), and associated consequencesSoftware development expertise in two or more high-level programming and scripting languagesExperience in evolutionary database design, query performance analysis, and indexing as acornerstone for delivering scalable, performant products and servicesExperience in designing, building, and optimizing automated pipelines with automated testing andautomated security controlsExperience in performing Root Cause Analysis and Problem ManagementExperience working in Agile Scrum teams with demonstrated success leading improvements (gettingbetter/faster/happier)