JOBSEARCHER

Senior Site Reliability Engineer

ArtechBethesda, MDApril 15th, 2026
Job Title: SRE Work Location: Bethesda, MD (Onsite) Only on W2 No C2C USC/ GC/ TN-Visa onlyLead cloud architecture and site reliability strategy for the Client Growth Platform (MGP), including modernization of CI/CD pipelines from Jenkins to GitLab CI and Harness.Contribute to Kubernetes operator customization and REST-based automation to streamline secrets lifecycle management and reduce operational risk.Define structured logging and telemetry standards for distributed microservices, enabling trace-level observability and cost-efficient logging.Reduce MTTR by ~30% through proactive monitoring, automated alerting, and standardized postmortem practices.Collaborate with data engineering teams to ensure reliable execution of distributed data processing platforms (Apache Spark, Hadoop) under production workloads.Architect and govern scalable CI/CD pipelines for microservices on EKS, defining reusable deployment patterns, Helm charts, and security controls.Design and deploy a high-availability, multi-tier DR application stack (Apache, Spring Boot, PostgreSQL) on AWS, supporting business continuity.Develop reusable Terraform modules for AWS infrastructure (EKS, ECS, EC2, RDS, ALB, VPC) and Confluent Cloud, ensuring compliant and repeatable provisioning.Integrate Vault operator on EKS and implemented automated API key rotation workflows using HashiCorp Vault.Enforce cloud security through IaC policies, AWS Config, GuardDuty, Security Hub, and automated compliance scans (CIS, OpenSCAP).Optimize AWS cost and performance using Cloudability, autoscaling strategies, and capacity tuning.