Systems Reliability Engineer
We’re looking for a Systems Reliability Engineer to own the reliability of our system across cloud, edge, and real-world environments. Our platform runs across distributed infrastructure—connecting cloud services, on-site compute, and live video/data pipelines inside warehouses. This role is responsible for making systems observable, diagnosable, and repeatable as we scale across deployments. You’ll work closely with engineering and deployment teams to ensure the system performs reliably in production—not just in ideal conditions.What You’ll OwnOwn reliability of systems across cloud (Kubernetes), edge compute, and on-site deploymentsBuild and maintain monitoring, alerting, and observability systemsDefine and improve incident response, severity levels, and on-call processesImprove deployment and bring-up workflows across facilitiesDiagnose issues across infrastructure, networking, and distributed systemsPartner with engineering to identify root causes and prevent recurring issuesImprove system visibility, debugging, and operational toolingHelp make deployments repeatable and scalable across sitesRequired Qualifications3+ years of experience in SRE, infrastructure, or distributed systemsStrong Linux and networking fundamentalsExperience operating systems in production environmentsExperience working with networking in constrained or distributed environments (e.g., VPNs, secure tunnels, on-site networking)Experience with:Kubernetes and containerized systemsCloud platforms (GCP, AWS, or Azure)Observability tools (Prometheus, Grafana, OpenTelemetry, etc.)Ability to debug issues across multiple layers of the stack (infra → services → network)Comfortable working in real-world, imperfect environments (not just clean cloud systems)Strong ownership and ability to drive issues to resolutionPreferred QualificationsExperience with multi-site or edge deploymentsExperience with event-driven systems (Kafka or similar)Familiarity with video or streaming systems (RTSP, WebRTC)Experience working with hardware-integrated systemsExposure to security/compliance frameworks (SOC2, ISO27001, etc.)US citizen/ permanent residentLocated in SFBAY or NY areaWhy This Role MattersWe’re scaling from a small number of deployments to many, and this role is critical to making the following happen:Systems that work outside ideal environmentsFast, reliable diagnosis and recovery when things breakRepeatable deployments across real-world facilitiesEqual Opportunity StatementWe’re an equal opportunity employer that values diversity and inclusion. We welcome teammates of all backgrounds and don’t discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.BenefitsAt Claryo, we offer a competitive benefits package that supports your health and well-being, including — top-tier medical, dental, and vision coverage, 401k with employer matching, equity, parental leave, and unlimited vacation.Compensation Range: $150K - $170K