JOBSEARCHER

Senior Observability Engineer (ESS Platform SME)

Headway TekMcLean, VAMay 11th, 2026
Job Title: Senior Observability Engineer (ESS Platform SME)Location: McLean, VA & Plano,TX (onsite) Job Type: C2C or W2Role OverviewWe are seeking a highly experienced Senior Observability Engineer with deep expertise in ESS (Elastic Stack) to lead and accelerate the development of enterprise-grade observability capabilities across mission-critical applications.This role requires a hands-on SME who can design, build, and scale observability dashboards, APM, tracing, and monitoring solutions exclusively within ESS. The candidate will play a key role in transforming current monitoring into a proactive, intelligent, and scalable observability ecosystem.This is a high-impact, fast-paced engagement (target requiring ownership, technical depth, and execution excellence.Key ResponsibilitiesESS Observability Architecture & Implementation Design and implement end-to-end observability solutions using ESS (Elastic Stack).Build a centralized observability layer covering all MF applications.Ensure block-level aggregation with drill-down to:Application-level metricsAPM tracesLogs and eventsService dependenciesDashboard Engineering (Critical Priority) Develop and scale a large backlog of ESS dashboards, including but not limited to:Cluster Health (OCP/K8s)API & APM DashboardsService Health & Dependency MonitoringPod Status / Restart / Scaling MetricsHTTP Status Analytics (200/400/500 trends)Transaction Processing MetricsInfra Metrics (CPU, Memory, Disk, Network)Synthetic Monitoring & AvailabilityBuild intuitive, drill-down dashboards from MF Block Service Application level.APM, Tracing & Monitoring Expansion Expand ESS-based:Application Performance Monitoring (APM)Distributed tracingReal User Monitoring (RUM)Synthetic monitoringEnable end-to-end traceability across microservices.Proactive Observability & Alerting Design and implement smart alerting rules:Move from reactive proactive detectionReduce noise, improve signal qualityDefine SLOs, SLIs, and error budgetsEnhance anomaly detection and trend analysisCollaboration & Leadership Work closely with:EOT Observability TeamInternal CDLsApplication teamsAct as ESS Observability SMEProvide guidance, standards, and best practicesRequired Skills & Experience Strong hands-on experience with ESS (Elastic Stack):ElasticsearchLogstashKibanaBeats / Elastic AgentElastic APMProven experience building enterprise-scale observability dashboards in ESSDeep understanding of:Microservices architectureKubernetes / OpenShift (OCP)Experience with:APM, distributed tracing, logging, metrics correlationAbility to design multi-layer observability (infra platform app)Strongly Preferred Experience with:Synthetic monitoring tools integrated with ESSReal User Monitoring (RUM)Service maps and dependency graphsKnowledge of:CI/CD observability integrationAlerting frameworks within ElasticScripting: Python / Shell / Groovy (nice to have)Soft SkillsStrong ownership mindsetAbility to work under aggressive timelinesExcellent problem-solving skillsClear communication with technical and non-technical teamsSuccess Criteria (First 3 6 Months)Deliver enterprise-grade ESS observability dashboardsAchieve full MF application visibilityImplement end-to-end APM + tracing coverageEstablish proactive alerting frameworkAdditional NotesCandidate must be an ESS expert - alternative tools experience alone will not be sufficient.This is a high-priority, business-critical role with immediate impact expectations.