JOBSEARCHER

Site Reliability Engineering Manager

Job Title: Production Support Analyst (OR) SREDuration: 12+ Months ContractMode: Hybrid – 3 days to officeLocation – South Jordan Utah 84095Interview Process:- 1st: Zoom- 2nd: Onsite1) As Production Support Analyst, your responsibilities will include, but not be limited to: - Monitoring for and resolving issues across the entire tech stack: hardware, software, application and network. A majority of your time will be devoted to production support activities.- Working closely with engineering/development teams to address repetitive issues, reduce operational effort and the likelihood of future service disruptions.- Partnering with business users and other technology teams to manage significant events such as business continuity/disaster recovery tests, IPOs, stock splits, and major infrastructure changes. - Defining and refining standard operating procedures for everything from monitoring to troubleshooting complex code and infrastructure issues.- Identifying and driving opportunities to improve platform supportability through Advocating for reliability priorities in application design reviews and operational readiness exercises for new and existing services.- Participating in weekend and off hours on-call Collaborating and striving to understand business users? needs and problems.2) As a System Reliability Analyst, your responsibilities will include, but not be limited to:- Working closely with engineering/development teams to design, build, optimize, and maintain systems.- Troubleshooting issues across the entire technology stack: hardware, software, application, and network.- Aggressively targeting toil and operational risk, and deploying solutions to reduce these.- Broadening infrastructure and application observability.- Proactively identifying and addressing active or potential risks to system reliability.- Advocating for reliability priorities in application design reviews and operational readiness exercises for new and existing services.Qualifications:- External What skills and experience do I need?You should apply if you have at least a Bachelor's degree in Computer Science or other technical discipline(s), plus hands-on experience with any combination of the following:- 3-5+ years practical experience in production systems support or application development- Hands on experience managing systems in a large scale distributed Unix/Linux environment is essential.- Automation-related experience is required, using scripting languages such as Python, bash, Perl, and/or Ruby. Higher-level compiled languages such as C++, C#, JAVA, Scala, and Go are a big plus.- Deep knowledge of and hands-on experience applying the principles of System/Site Reliability Engineering (SRE).- Practical experience designing and instrumenting SLO/SLI dashboards is particularly valuable.- Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace- Experience with Puppet, Ansible, Chef, GitHub or any automation/configuration/release management tools- Awareness of, and ability to reason through modern software and systems architectures, including load-balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.- Working ability to interact with message transport platforms and protocols (MQ, CPS, XML, FIX) and distributed database technologies (DB2, Sybase, Mongo, GreenPlum, Postgres, KDB).- Autosys scheduling and batch processing concepts.- Deep understanding of infrastructure and operating system concepts such as processes, memory allocation, and networking, with an understanding of how applications are affected by the above, and ability to debug and troubleshoot accordingly.