JOBSEARCHER

Senior/Staff/Principal Software Engineer – Observability Engineering

Job Description:Own the end-to-end design and implementation of the AppGate observability fabric — from telemetry SDKs in our clients and gateways, to the LogForwarder pipeline, to customer-side integrations. Make foundational technical decisions — transport protocols, sampling strategies, schema design, correlation models — that determine whether our platform scales gracefully to hundreds of millions of events per day. Enable next-generation capabilities, including OpenTelemetry-Native Telemetry Fabric, High-Cardinality Data Pipeline, End-to-End Distributed Tracing, On-Demand Packet Capture, and more. Define telemetry schema, correlation model, transport, and sampling strategies spanning client devices, controllers, and gateways. Validate at Customer Scale: Test in lab environments matching our largest deployments and hunt down cardinality explosions and pipeline backpressure before customers see them. Drive Integration Standards: Own the OTLP, Prometheus, and JSON-log compatibility surface and validate ingestion into Datadog, Splunk, Nexthink, and Elastic. Collaborate Cross-Functionally: Work directly with product, R&D, and marquee customers in defense and critical infrastructure to shape requirements and deliver outcomes that matter. Requirements:8+ years of engineering experience with at least 4 years dedicated to observability, telemetry, or large-scale data infrastructure (Datadog, Splunk, Elastic, Honeycomb, New Relic, Grafana Labs, or equivalent). Deep OpenTelemetry expertise: OTLP, the OTel Collector, semantic conventions, context propagation, and head/tail sampling — you can debate the trade-offs in your sleep. Distributed tracing in production: You've designed or significantly contributed to a tracing system handling real customer traffic, not just a side project. High-throughput pipeline experience: Hands-on with systems ingesting 100M+ events per day, including back-pressure handling, batching, and storage trade-offs. Strong systems programming: Production Go and/or Rust preferred. Comfort across the stack, from agent code to backend services. Networking and security fluency: Comfortable with TLS, DNS, TCP, and identity protocols. Prior ZTNA, SASE, or SD-WAN experience is a strong plus. Mindset: Pragmatic, opinionated, and impact driven. You know when to prototype and when to ship.