Senior Observability Engineer with Elastic Stack Platform SME
Job DescriptionWe are seeking a highly experienced Senior Observability Engineer with deep expertise in ESS (Elastic Stack) to lead and accelerate the development of enterprise-grade observability capabilities across mission-critical applications.This role requires a hands-on SME who can design, build, and scale observability dashboards, APM, tracing, and monitoring solutions exclusively within ESS. The candidate will play a key role in transforming current monitoring into a proactive, intelligent, and scalable observability ecosystem.This is a high-impact, fast-paced engagement (target < 6 months) requiring ownership, technical depth, and execution excellence.Key Responsibilities:ESS Observability Architecture & ImplementationDesign and implement end-to-end observability solutions using ESS (Elastic Stack).Build a centralized observability layer covering all MF applications.Ensure block-level aggregation with drill-down to:Application-level metricsAPM tracesLogs and eventsService dependenciesDashboard Engineering (Critical Priority)Develop and scale a large backlog of ESS dashboards, including but not limited to:Cluster Health (OCP/K8s)API & APM DashboardsService Health & Dependency MonitoringPod Status / Restart / Scaling MetricsHTTP Status Analytics (200/400/500 trends)Transaction Processing MetricsInfra Metrics (CPU, Memory, Disk, Network)Synthetic Monitoring & AvailabilityBuild intuitive, drill-down dashboards from MF Block → Service → Application level.APM, Tracing & Monitoring ExpansionExpand ESS-based:Application Performance Monitoring (APM)Distributed tracingReal User Monitoring (RUM)Synthetic monitoringEnable end-to-end traceability across microservices.Proactive Observability & AlertingDesign and implement smart alerting rules:Move from reactive → proactive detectionReduce noise, improve signal qualityDefine SLOs, SLIs, and error budgetsEnhance anomaly detection and trend analysisCollaboration & LeadershipWork closely with:EOT Observability TeamInternal CDLsApplication teamsAct as ESS Observability SMEProvide guidance, standards, and best practicesRequired Skills & Experience:Strong hands-on experience with ESS (Elastic Stack):ElasticsearchLogstashKibanaBeats / Elastic AgentElastic APMProven experience building enterprise-scale observability dashboards in ESSDeep understanding of:Microservices architectureKubernetes / OpenShift (OCP)Experience with:APM, distributed tracing, logging, metrics correlationAbility to design multi-layer observability (infra → platform → app)Strongly Preferred:Experience with:Synthetic monitoring tools integrated with ESSReal User Monitoring (RUM)Service maps and dependency graphsKnowledge of:CI/CD observability integrationAlerting frameworks within ElasticScripting: Python / Shell / Groovy (nice to have)Soft Skills:Strong ownership mindsetAbility to work under aggressive timelinesExcellent problem-solving skillsClear communication with technical and non-technical teamsSuccess Criteria (First 3–6 Months):Deliver enterprise-grade ESS observability dashboardsAchieve full MF application visibilityImplement end-to-end APM + tracing coverageEstablish proactive alerting frameworkAdditional Notes:Candidate must be an ESS expert — alternative tools experience alone will not be sufficient.This is a high-priority, business-critical role with immediate impact expectations.Required Skills: Elasticsearch, Logstash, Kibana, Beats, Elastic Agent, Elastic APM, Observability dashboards in ESS, Microservices architecture, Kubernetes, OpenShift, RUM, CI/CD observability integration, Alerting frameworks within Elastic, Python, Shell Scripting, Groovy scripting