JOBSEARCHER

Senior DevOps/SRE Engineer

ARCHIVED

We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.

Senior DevOps/SRE EngineerWe are seeking a high-caliber Senior DevOps/SRE Engineer to join a mission-critical team in Washington, DC, focused on maturing our cloud-native ecosystem. In this role, you will bridge the gap between development and operations by architecting resilient infrastructure, implementing robust SRE principles like SLOs and error budgeting, and driving a "Security-First" CI/CD culture. As a technical leader, you will not only manage high-availability AWS environments and Kubernetes orchestration but also serve as a force multiplier for our engineering teams by building self-service platforms and automated governance tools that ensure operational excellence and cost-efficiency.Key ResponsibilitiesReliability & Incident Management: Define and maintain SLOs/SLIs, manage error budgets, and lead high-level incident response and blameless postmortem analyses.Infrastructure as Code (IaC): Architect and maintain secure, scalable environments using Terraform, Ansible, and CloudFormation to ensure repeatable deployments.CI/CD & Deployment Strategy: Design secure delivery pipelines (GitHub Actions/Jenkins) incorporating automated rollbacks, canary releases, and blue-green deployment patterns.Comprehensive Observability: Build and manage full-stack telemetry pipelines, dashboards, and alerting systems using Prometheus, Grafana, Datadog, or ELK.SecDevOps Integration: Enforce security-as-code by integrating SAST/DAST, secrets scanning, and SBOM validation into the automated software development lifecycle.Efficiency & FinOps: Monitor cloud spend trends and implement right-sizing strategies to ensure high performance at an optimal cost-to-value ratio.Internal Enablement: Develop shared playbooks, reusable automation modules, and self-service tools to boost developer velocity and reduce friction.Technical Leadership: Mentor cross-functional teams and establish organization-wide best practices for fault tolerance and operational readiness.QualificationsEducation: Bachelor's degree in Computer Science, Engineering, or a related technical field.Experience: 5+ years in DevOps, SRE, or Platform Engineering, including leadership experience in production automation.Cloud Expertise: 3+ years of hands-on experience managing high-availability production environments, specifically within AWS (IAM, Networking, Compute).Containerization: Deep proficiency with Kubernetes, Docker, and Linux systems administration.Tooling Mastery: Advanced experience with Terraform, GitOps patterns, and CI/CD security tollgates.Automation Skills: Strong scripting proficiency in Python, Go, or Bash for custom tool development.Operational Mindset: Proven track record in chaos engineering, capacity modeling, and managing complex observability stacks.Communication: Exceptional ability to document technical architectures and lead collaborative engineering initiatives.