MDKJR Open Practice Staff Site Reliability Engineer (SRE) - DevOps
Job DescriptionEmerald Resource GroupSr DevOps EngineerSalary: Up to 110k for the right backgroundHybrid: In Ritchfield, OhioOnly US Citizens will be consideredAbout the RoleWe are seeking a DevOps Engineer to scale our high velocity payments platform and manage transactions within our multi-tenant AWS environment. You will dive into our established architecture to improve automation and system resiliency while solving live production challenges. This role requires a practical engineer to optimize processing engines and maintain compliance as our merchant volume grows.ResponsibilitiesEnsure the reliability, availability, and performance of a multi-tenant production systemScale and operate AWS-based infrastructure supporting a Java web applicationMonitor and troubleshoot issues across application, database, cache, and data warehouse layersImprove observability through metrics, logging, and alertingParticipate in on-call rotations and lead incident response and root cause analysisIdentify performance bottlenecks and scaling limits in a shared-tenant environmentAutomate operational tasks and reduce toil where it matters mostWork within existing frameworks and tooling to make systems safer and more scalablePartner with developers to improve deployments, capacity planning, and failure handlingImplement automated load and fuzz testingDefine key service level objectives (SLO)Technologies You'll Work WithAWS (EC2, ECS, RDS, ElastiCache, Redshift, and related services)Java-based web applicationsMySQL (performance tuning, scaling, reliability)Amazon ElastiCache (Redis/Memcached)Amazon RedshiftMonitoring and alerting tools (Graphite, Grafana, Cloudwatch)Qualifications3+ years of experience in SRE, DevOps, or production operations rolesStrong understanding of AWS infrastructure and cloud-native scaling patternsExperience supporting Java applications in productionSolid knowledge of MySQL performance, replication, and scaling strategiesExperience operating cache layers and data stores at scaleUnderstanding of multi-tenant architectures, including isolation, noisy-neighbor issues, and capacity planningStrong Linux fundamentals and troubleshooting skillsAbility to stay calm, think clearly, and prioritize during incidents