Site Reliability Engineering (SRE) Tech Lead
Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more.Backed by top investors including Greylock, Norwest Venture Partners, and IVP, we’ve built a complete SaaS security platform to reduce risk, detect and respond to threats, and prevent breaches at the source. Our team includes leaders who helped define the categories of endpoint and identity security at CrowdStrike, Okta, Cylance, and Carbon Black.Now, we’re transforming how SaaS is secured—in the era of agentic AI.Today, Obsidian is trusted by global enterprises like Snowflake, T-Mobile, and Pure Storage. We protect more than 200 organizations across North America, Europe, the Middle East, Southeast Asia, Australia, and New Zealand—including many of the world’s largest Fortune 1000 and Global 2000 companies.With strong global momentum, a growing partner ecosystem including SentinelOne, Databricks, and Google Cloud, and a major fundraise on the horizon, we’re scaling quickly toward long-term growth and IPO readiness. Join us as we define the future of SaaS security!Site Reliability Engineering (SRE) Tech Lead Role OverviewAs the SRE Tech Lead at Obsidian, you will define and build the reliability foundation for a complex, multi-tenant SaaS platform serving enterprise and financial customers. You will operate as a peer to the DevOps and Platform Engineering leads, driving a unified reliability strategy across the organization.Your core mandate: ensure Obsidian detects every system failure before customers do—and communicates proactively when issues arise.This is a hands-on technical leadership role with high ownership and visibility, reporting directly to the CTO. You will architect and implement systems that handle real-world complexity: upstream SaaS dependencies, sparse and noisy data, and mission-critical enterprise workloads.Key ResponsibilitiesMap and instrument critical system paths for top-tier enterprise customersBuild connector health models to classify issues:Internal defects (“our bug”)Upstream SaaS outagesExpected sparse/low-signal scenariosEstablish tiered incident communication:Public status page for all customersDirect outreach for high-priority accountsDefine and begin rollout of SLI/SLO standards across microservicesDevelop self-service instrumentation tooling enabling engineering teams to own observabilityImplement baseline-aware anomaly detection across all connectors (beyond static thresholds)Mature incident response processes, including:Structured post-mortemsContinuous reliability improvementsRequired Qualifications7+ years in SRE, production engineering, or similar roles2+ years operating as a technical leadDeep expertise with:AWS and/or GCPKubernetes, HelmObservability stack (Prometheus, Grafana)CI/CD systems (GitLab CI/CD, ArgoCD)Proven experience building monitoring for multi-tenant SaaS systems with complex data pipelinesStrong debugging skills across distributed microservices and legacy systemsHands-on engineering mindset — able to instrument services directly, not just configure toolingTrack record of building or significantly improving incident detection and response systemsPreferred QualificationsExperience in B2B SaaS serving enterprise or financial customersFamiliarity with third-party SaaS connector ingestion patternsExperience building anomaly detection systems or baseline-aware alertingExperience implementing customer-facing status pages and incident communication frameworksWhy This RoleDirect impact: Work closely with the CTO and shape company-wide reliability strategyGreenfield opportunity: Build a detection and reliability platform from the ground upTechnically challenging: Solve for multi-tenant systems with upstream dependencies and sparse dataHigh stakes: Protect systems relied upon by major financial institutionsWhat Success Looks LikeObsidian consistently detects and diagnoses issues before customers are impactedClear, proactive communication builds customer trust during incidentsEngineering teams independently own observability through scalable toolingReliability becomes a measurable, continuously improving capability across the platformIf you’re excited about building systems that make failure predictable—and invisible to customers—this role offers both the challenge and the ownership to do it right.Employee BenefitsOur competitive benefits packages are designed to support our employees' well-being, both at work and at home. Our US based employees enjoy:Competitive compensation with equity and 401kComprehensive healthcare with dental and vision coverageFlexible paid time off and paid holiday time off 12 weeks of new parent or family leavePersonal and professional development resourcesFor more details on our US benefits, or for information on our international benefits, please see here.Pay TransparancyPlease note that the base pay range is a guideline and for candidates who receive an offer, the base pay will vary based on factors such as work location, as well as the knowledge, skills and experience of the candidate. In addition to a competitive base salary, this position is eligible for equity awards and may be eligible for sales commission or incentive compensation based on the role or function within the company.At Obsidian, we are proud to be an equal-opportunity employer. We value diversity and hire for talent, passion, and compassion. In compliance with federal law, all persons hired will be required to submit satisfactory proof of identity and legal authorization. If you have a need that requires accommodation, please contact accommodations@obsidiansecurity.comInformation collected and processed as part of any job applications you choose to submit is subject to Obsidian’s Applicant Privacy Policy.Base Salary Range: $250,000 USD - $280,000 USD