JOBSEARCHER

Observability Platform Engineer

You'll help design and scale observability platforms that handle telemetry from industry-leading GPU clusters and large-scale distributed systems. You'll work closely with experienced engineers to develop metrics pipelines, logging systems and tracing solutions that improve reliability and visibility across our servicesMust Haves:Experience with modern observability tools and frameworks, such as Prometheus, Grafana or OpenTelemetry (OTEL)Exposure with cloud platforms, such as AWS, Azure, or Google CloudFamiliarity with microservices architectures and containerized environments, such as Kubernetes and DockerInterest in system reliability, performance engineering and platform-scale infrastructureGood communication and collaboration skillsNice to HavesExposure to enterprise observability platforms, such as Datadog or DynatraceExperience working with telemetry data (metrics, logs, traces) in large environmentsProficiency in scripting or programming languages (e.g. Python, Go)Familiarity with Infrastructure-as-Code tools or deployment automation