Senior DevOps Engineer
This is a U.S. based position. All of the programs we support requireU.S. citizenship to be eligible for employment. All work must be conducted within the continental U.S.
Who we are:Raft (https://TeamRaft.com) is a customer-obsessed non-traditional defense tech company dedicated to empowering U.S. military and government agencies with cutting-edge AI/ML and data solutions. We are a leader in autonomous data fusion and Agentic AI, with a purposeful focus on Distributed Data Systems, Platforms at Scale, and Complex Application Development. With headquarters in McLean, VA, our range of clients includes innovative federal and public agencies leveraging design thinking, cutting-edge tech stack, and cloud-native ecosystem. We build digital solutions that impact the lives of millions of Americans.About the role:Raft is building mission-critical data platforms for the Department of War that process billions of events per day from hundreds of sensors and operational sources, delivering intelligence to operators who use it to make time-sensitive decisions. Our platform runs across multiple classification levels and deployment environments.As a Senior DevOps Engineer at Raft, you won't be operating in a pure infrastructure lane. You will be expected to understand the software you're deploying, contribute to it when needed, and engage with the data pipelines flowing through the systems you manage. This is a role for someone who thinks end-to-end, from data ingest and pipeline performance through to Kubernetes-based deployment, observability, and secure operations in defense environments.You will work across cloud and on-premises environments, partner closely with software and data engineers, and help Raft maintain the operational rigor and platform reliability that our most demanding customers depend on.What You'll DoDesign, implement, and maintain secure Kubernetes-based infrastructure supporting data platform workloads across cloud and on-premises environmentsBuild, manage, and improve CI/CD pipelines using GitLab and GitOps-based delivery patterns, enabling reliable, repeatable deployments across multiple classification levelsDevelop and maintain Infrastructure as Code (IaC) using tools such as Terraform and Ansible to provision, configure, and lifecycle-manage platform infrastructureCollaborate directly with software engineers to understand service architectures, dependencies, and runtime behavior, and contribute code-level changes where needed to improve deployability, reliability, or observabilitySupport and optimize data streaming and processing pipelines built on technologies such as Kafka, Kafka Streams, Flink, and Pinot, diagnosing bottlenecks, tuning configurations, and ensuring data integrity across the platformImplement and manage platform observability using monitoring (Prometheus, Grafana), logging (Fluentbit, Loki, Kibana), and alerting tooling to maintain operational awareness in production environmentsApply and enforce DevSecOps practices including container hardening, vulnerability scanning, software supply chain security, and compliance-driven deployment patterns in regulated government environmentsManage and debug complex Helm chart deployments, service mesh configurations (Istio), and Kubernetes networking across multi-cluster and multi-environment topologiesSupport operations across multiple deployment targets, cloud-hosted (AWS, Azure), on-premises data centers, and edge/tactical environments, adapting platform patterns to the constraints of eachWrite clean, maintainable automation and tooling in Java or Go to accelerate platform operations, reduce toil, and improve developer experience across engineering teamsEngage directly with customers at the most operationally demanding locations in the Department of WarWhat we are looking for:5+ years of relevant hands-on experience in DevOps or platform engineering roles.5+ years of production experience with Docker and Kubernetes, including provisioning, operating, and troubleshooting clusters in real-world environmentsStrong experience building and maintaining CI/CD pipelines, with hands-on proficiency in GitLab CI, GitOps workflows (Flux, ArgoCD), and modern software delivery practicesExperience supporting data-intensive platforms using streaming technologies such as Kafka, or Flink, including configuration, tuning, and operational supportSolid understanding of data engineering fundamentals, including ETL/ELT pipeline design, data storage patterns, data governance concepts, and integration with downstream consumersProficiency with Infrastructure as Code tooling, particularly Terraform; experience with Ansible or similar configuration management toolsStrong Helm proficiency, including authoring and maintaining charts for complex multi-service deploymentsHands-on experience with platform observability tooling: Prometheus, Grafana, Fluentbit, Loki or Elasticsearch/KibanaDemonstrable software development skills in Java and/or Go, comfortable reading, modifying, and contributing to application codebases, not just deploying themExperience with cloud infrastructure on AWS and/or Azure, including networking, IAM, storage, and managed Kubernetes servicesStrong systems thinking, troubleshooting discipline, and the ability to work independently in a fast-moving environment with competing prioritiesExperience applying secure and compliant deployment practices in regulated or government environmentsActive Secret clearance required; must be eligible for and willing to obtain a Top Secret/SCI clearanceAbility to obtain Security+ certification within the first 90 days of employmentAbility to travel up to 25%Highly preferred:Experience with service mesh technologies, particularly Istio, including traffic management, mTLS, and observability integrationFamiliarity with Kubernetes-based ML/AI platforms such as Kubeflow, KServe, or Ray, and experience supporting GPU-enabled workloadsExperience with software supply chain security tools including container image scanning, SBOM generation, and runtime vulnerability managementBackground supporting deployments across multiple classification levels or air-gapped / disconnected environmentsExperience with package and dependency management across polyglot environments (Maven, Gradle, NPM, Yarn, pip)Familiarity with compliance frameworks relevant to DoW software deployment, including RMF, STIGs, and IL4/IL5/IL6 requirementsContributions to or ownership of internal developer platforms, golden path tooling, or shared infrastructure servicesExperience with distributed tracing and APM tooling (e.g., OpenTelemetry, Jaeger, Tempo)Existing TS/SCI clearance strongly preferredWhat Success Looks LikePlatform deployments are reliable, repeatable, and secure across every environment Raft operates in, from commercial cloud to classified on-premisesEngineering teams move faster because CI/CD workflows, infrastructure tooling, and deployment patterns are solid, well-documented, and easy to useData pipelines running through Raft's platform are stable, observable, and performant, with clear ownership of issues when they ariseYou've earned the trust of software engineers by understanding what they've built and engaging meaningfully in conversations about architecture, runtime behavior, and operational trade-offsCompliance and security posture across deployment environments is continuously maintained, not bolt-onClearance Requirements:Minimum active Secret clearance with ability to obtain and maintain an active TS SCI security clearanceSalary Range: $150,000.00 - $200,000.00Work Type:Hybrid with up to 25% travelActive Secret clearance required to start; TS/SCI eligibility requiredWhat we will offer you:Highly competitive salaryFully covered healthcare, dental, and vision coverage401(k) and company matchTake as you need PTO + 11 paid holidaysEducation & training benefitsGenerous Referral BonusesAnd More!Our Vision Statement:We bridge the gap between humans and data through radical transparency and our obsession withthemission.Our Customer Obsession:We will approach every deliverable like it's a product. We will adopt a customer-obsessed mentality. As we grow, and our footprint becomes larger, teams and employees will treat each other not only as teammates but customers. We must live the customer-obsessed mindset, always. This will help us scale and it will translate to the interactions that our Rafters have with their clients and other product teams that they integrate with. Our culture will enable our success and set us apart from other companies.How do we get there?Public-sector modernization is critical for us to live in a better world. We, at Raft, want to innovate and solve complex problems. And, if we are successful, our generation and the ones that follow us will live in a delightful, efficient, and accessible world where out-of-box thinking,and collaboration is a norm.Raft's core philosophy is Ubuntu: IAm, BecauseWe are . We support our "nadi" by elevating the other Rafters. We work as a hyper collaborative team where each team member brings a unique perspective, adding value that did not exist before. People make Raft special. We celebrate each other and our cognitive and cultural diversity. We are devoted to our practice of innovation and collaboration.
We're an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.