Lead Site Reliability Engineer
A company is looking for a Lead Site Reliability Engineer to own reliability outcomes for a modern SaaS platform.
Key Responsibilities
Define and drive reliability strategy across control-plane and data-plane systems
Establish and operationalize SLOs, SLAs, and error budgets
Lead incident management and drive systemic fixes for long-term reliability improvements
Required Qualifications
6+ years leading delivery of complex, distributed systems or SaaS platforms
Strong experience with multi-region, split-plane architectures
Proven track record improving SLOs, MTTR, and system reliability at scale
Proficiency in programming languages such as Python, Java, C++, or JavaScript
Deep experience with Kubernetes, CI/CD, and Infrastructure as Code