JOBSEARCHER

Observability Engineer

Role Title: Observability EngineerEmployment Type: ContractDuration: 6 Months (Potential Extension)Location: Cleveland, OH area – Hybrid (4 days onsite / 1 day remote)About the RoleWe are seeking an experienced Observability Engineer to support and expand a centralized enterprise observability platform. This initiative is focused on building a true “single pane of glass” monitoring environment using modern telemetry and monitoring technologies including Prometheus, Grafana, and Loki.The current environment captures approximately 50% of server telemetry and is now evolving to include cross-domain observability across infrastructure, applications, databases, storage, and business transaction data. Long-term goals include enabling AI/ML-driven anomaly detection and intelligent root-cause analysis.This is an opportunity to play a key role in building an enterprise-wide operational intelligence platform.ResponsibilitiesExpand telemetry ingestion across infrastructure, databases, storage platforms, applications, and network environmentsAssist with onboarding remaining systems and extending monitoring beyond traditional OS metricsBuild and enhance Grafana dashboards that correlate infrastructure health with application performance and business transaction metricsDevelop and maintain synthetic monitoring scripts using Playwright or similar tools to simulate critical user journeysConfigure and optimize alerting workflows using Alertmanager and LokiImprove signal-to-noise ratio and reduce alert fatigue through better event management practicesEstablish and maintain telemetry labeling standards and data quality practicesSupport troubleshooting, root-cause analysis, and operational documentation effortsPartner with engineering and infrastructure teams to drive observability best practices across the enterpriseRequired QualificationsHands-on experience with:PrometheusGrafanaLokiAlertmanagerStrong experience writing PromQL queries and building Grafana dashboardsExperience designing or supporting enterprise observability and monitoring platformsAbility to collect and normalize telemetry across:ServersDatabasesStorage environmentsNetworksApplicationsExperience with synthetic monitoring tools such as Playwright or SeleniumStrong Linux command-line experienceExperience editing and managing YAML and JSON configuration filesKnowledge of alert routing, escalation workflows, and reducing alert fatigueUnderstanding of telemetry standards, labeling strategy, and data hygiene practicesStrong troubleshooting and analytical skillsPreferred QualificationsOracle and SQL database experienceExperience with SNMP, network flow data, or infrastructure performance monitoringExposure to AI/ML-based observability or anomaly detection initiativesThis role offers the opportunity to help shape the future of enterprise monitoring and observability while working on high-impact initiatives supporting large-scale infrastructure and application environments.