Site Reliability Engineer Architect (Only Fulltime)
Role: Senior SRE EngineerLocation: Austin, Texas(Hybrid)Experience:12+YearsJob Type: Full-timeRole SummaryWe are seeking a Senior SRE with strong expertise in Unified Observability, proactive detection, AIOps, and GenAI-driven operations to support complex, distributed financial services platforms. The role requires hands-on experience designing SLI/SLO-driven monitoring, dynamic thresholds, intelligent alerting, and AI/ML-based anomaly detection across multi-stream architectures.Key ResponsibilitiesObservability & Reliability EngineeringDesign and implement unified observability dashboards across metrics, logs, traces, events, and topologyDefine and manage SLIs, SLOs, and error budgets aligned to business outcomesBuild actionable dashboards for operations, engineering, and leadershipImplement alerting strategies using static and dynamic thresholdsProactive Detection & AIOpsLeverage AI/ML/AIOps to detect anomalies, predict incidents, and reduce MTTRTransition monitoring from reactive alerts to proactive insightsImplement noise reduction, alert correlation, and root cause analysisApply baseline modeling, seasonality detection, and anomaly scoringDistributed Systems & Dependency AnalysisMonitor and troubleshoot multi-service architectures involving:MicroservicesDownstream APIsKafka / streaming platformsCloud infrastructure (Terraform, IaC)Identify whether issues originate from:Upstream/downstream dependenciesStreaming platformInfrastructureApplication codeTooling & PlatformsDeep hands-on experience with Dynatrace (mandatory)Experience with:OpenTelemetryPrometheus / GrafanaELK / EFKCloud-native monitoring (AWS/Azure/GCP)Strong JSON-based telemetry manipulation and enrichmentGenAI & LLM EnablementApply GenAI / LLMs for:Incident summarizationRoot cause explanationRunbook recommendationsAuto-remediation suggestionsCollaborate with platform teams to operationalize GenAI safely