Cloud Site Reliability Engineer Contract Strategic Staffing SolutionsDetroit MI2 weeks ago
STRATEGIC STAFFING SOLUTIONS (S3) HAS AN OPENING!Strategic Staffing Solutions is currently looking for a Cloud Site Reliability Engineer for a W2 contract opportunity with one of our largest clients!Candidates should be willing to work on our W2 ONLY, No C2CTitle: Cloud Site Reliability EngineerLocation: Detroit, MISchedule: HybridDuration: 12 MonthsRole Type: W2 contract engagementKey SkillsIAC, Terraform, Major Incident resolutionShould have at least 3 years' experience as a site reliability engineer on a cross functional agile team working in Azure.Have working knowledge of agile development methodologies (scrum, sprints, KanBan etc.) and tools (Azure DevOps etc.)Have at least 3 years hands-on experience using IaC tools Terraform, Github, Ansible and PackerProven experience across testing, integration, source code management, deployment and containerizationSound problem-solving skills with the ability to quickly process complex information and present it clearly and simplyExperience with cloud technologies and services including those for Compute, Storage, Databases and API ManagementOn-premise to cloud migration experienceJob SummaryThe Cloud Site Reliability Engineer (SRE) works closely with the cloud development team, IT operations team, and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure and application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system scalability and reliability. Their core focus lies in standardization and automation to build and run fault-tolerant systems. Typically, SREs possess a background in software engineering, system engineering, or system administration, coupled with substantial IT operations experience. SREs oversee availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.Key AccountabilitiesWriting and developing code to automate processes, such as analyzing logs, testing production environments and responding to any issuesCollaborates with agile teams and business partners to develop specifications that resolve problems and enhancement needs, including focusing on monitoring, and metrics for operational readinessIdentify bottlenecks in development and deployment processes and designs automation solutions to mitigateDevelop new capabilities in displaying/monitoring/alerting on key performance indicators by tracking business transactions in real-timeMaintain and grow knowledge of platform configuration management, monitoring of established metrics, and troubleshootingProvides continuous feedback to development teams on system stability, defect analysis, and system enhancementsDesign and develop alert escalation and incident response automationProvide production support for cloud service outages and incidents and work on both tactical and strategic plans for outage preventionProvide feedback on resiliency and maintainability of solutions to Cloud and App architectsConduct disaster recovery scenario generation and testingImplement sustainable, audit-ready processes that support information technology controls, including deployment execution, access management, audits, incident management, and related requirements.EducationBachelor's degreeJob ID: JOB-241174Publish Date: 09 Jun 2025J-18808-Ljbffr