SITE RELIABILITY ENGINEERING MANAGER
Site Reliability Engineering ManagerEnsure Reliability of Systems that Move the Nation's Food SupplyWho We AreUnited States Cold Storage owns and operates one of the most complex temperature-controlled logistics networks in North America. Every day, our systems coordinate the storage and movement of food at national scale across a network of state-of-the-art distribution centers, including multiple highly automated warehouse facilities.We continue to advance our core warehouse and logistics platforms. Our current focus is on modular, event-driven, API-first and cloud architectures. We continue to enhance reliability and accelerate engineering productivity by strengthening our SRE and AI practices. This is a large investment in innovation to continue to drive operational excellence at our facilities.If you want to build durable systems that operate in the physical world at scale, this is that opportunity.The RoleThe Site Reliability Engineering Manager will design and implement the company’s SRE framework from the ground up.You will define what reliability means at US Cold.You will establish SLIs and SLOs.You will modernize monitoring and incident response.You will build the playbook others will follow.This is both a hands-on technical role and a practice-building leadership position.You will report to the Director of IT Operations and collaborate across Software Engineering, Customer Integration Technology, Data Engineering, Infrastructure, and Security.What You Will OwnEstablish the company’s first SRE practice including principles, standards, tooling, and operational processes.Define SLIs, SLOs, and error budgets across SaaS, on-prem, and custom services.Build reliability dashboards and executive-level reporting.Implement and evolve observability across logs, metrics, and distributed tracing.Mature incident response, outage management, and post-incident review processes.Partner with engineering to design resilient systems and reduce operational toil.Strengthen CI/CD reliability using safe deploy strategies such as canary and blue/green patterns.Implement cost visibility and cloud governance in partnership with Finance.Build runbooks, playbooks, and operational standards.Establish on-call structures and escalation clarity.Assist in hiring, mentoring, and developing future SRE team members.This is foundational work. The systems and practices you design will shape how engineering operates for years.Technical EnvironmentAzure cloud infrastructureInfrastructure as Code using Bicep, Terraform, or ARMGitHub Actions for CI/CD orchestrationSafe deployment patterns including gated releases, canary, and blue/greenObservability across logging, metrics, and distributed tracingPython scripting for automation and reliability toolingSaaS integrations, on-prem infrastructure, and custom-built servicesWhat We’re Looking ForBachelor’s degree in Computer Science, Engineering, or equivalent experience.5–7+ years in SRE, DevOps, Infrastructure, or Production Engineering.Hands-on ownership of production services.Proven experience implementing SLIs, SLOs, observability, and automation.Leadership in major incident response and post-incident reviews.Deep CI/CD expertise, particularly GitHub Actions.Strong Python scripting for automation and operational tooling.Practical knowledge of cloud cost optimization and FinOps principles.Ability to influence cross-functional teams Why This Role Is DifferentThis is not an inherited SRE function.There is no existing framework to simply maintain.You WillDefine the reliability bar.Build the operating model.Influence architectural decisions.Establish executive-level visibility into system health.Create a culture where reliability is engineered, not reactive.This is an opportunity to build something durable inside a company modernizing its core technology platform.Compensation & Structure Salary Range: $160,000.00 - $190,000.00/yr. Bonus Eligible Full-time, exempt Reports to: Director of IT Operations Travel less than 10% Location : Hybrid, Camden, NJOperational ContextThis role is primarily technical and office-based, with occasional interaction in operational environments depending on system needs.Benefits IncludeIf annual hours are attained, these benefits may apply. Medical, Dental, Vision, Prescription, Legal Insurance, Pet Discount, Critical Illness, Accident Insurance, Hospital Indemnity, Long Term Care + Permanent Life Insurance, Identity Theft Protection, Short Term Disability Insurance, Long Term Disability Insurance, Supplemental Disability Insurance, Basic Life Insurance, Accidental Death and Dismemberment Insurance, Supplemental Life Insurance, Supplemental Spouse Life Insurance, Child Life Insurance, Loan Solution, Health Flexible Spending Account, Dependent Flexible Spending Account, Telemedicine, Virtual Primary Care, Prescription Savings Plan, Prescription Specialty Copay Assistance Program, Weight Management Program, Chronic Condition Management, Care Navigator Program, 24/7 Nurse Line, Expert Medical Opinion, Precious Additions Maternity Program, Health Advocacy, Employee Assistance Program, Digital Cognitive Behavioral Therapy, Digital Physical Therapy, Behavioral and Mental Health Platforms, Auto and home discount program, Secure Travel Protection, Discount Programs, 401(k) plan, Education Assistance, Paid Time Off, Referral program & Commuter Benefit (NJ ONLY).Physical & Operational ContextMay require physical effort associated with using the computer to access information, or occasional standing, walking, lifting needed to carry out everyday activities. Effective communication, vision, and hearing are essential for safety and productivity. Operate scanners, tablets, radios, phones, computers, and other essential equipment as required. Additional work hours may be requested by management to help manage employee production, projects, and/or special events. Engage in frequent personal interaction and communication. Attend in-person meetings and/or training on a regular basis. Possess strong arithmetic and reading skills. Follow verbal instructions, written instructions, and company policies. Work independently and coordinate with others. Fast-paced environment, managing stress and meeting productivity standards.Additional InformationJob functions may vary based on the area of operation. This description outlines the most common tasks required for the job. Reasonable accommodation may be provided to enable individuals with disabilities to perform essential duties. This job description may not encompass all tasks necessary to complete the role.