<Back to Search
Senior Site Reliability Engineer, Production Engineering
Seattle, WAApril 5th, 2026
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century's most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril's family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.ABOUT THE TEAMThe Production Engineering team is a newly formed organization within Anduril's Software Platform, dedicated to ensuring the reliability, performance, and scalability of mission-critical systems that directly support our warfighters in the field. We solve complex reliability challenges at massive scale, ensuring that critical components of Lattice-Anduril's autonomous command and control platform-operates flawlessly in the most demanding operational environments.This is a foundational role and you will be among the first hires building this team from the ground up. You'll have the unique opportunity to shape the technical direction, establish best practices, and define what production engineering excellence means at Anduril. Our team operates at the intersection of software engineering and systems reliability, building the infrastructure, tooling, and processes that keep our systems operational 24/7/365.ABOUT THE ROLEWe are seeking an experienced Senior Site Reliability Engineer who is passionate about building resilient, highly available systems that scale to meet the demands of the core systems powering Lattice. You will work closely with platform engineering teams, product developers, and field operations to proactively identify reliability risks, implement defensive strategies, and continuously improve the operational excellence of our software platform. If you thrive on solving hard problems at scale and want your work to have direct impact on national security, this is the role for you.WHAT YOU'LL DODesign and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platformDrive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issuesBuild and maintain infrastructure automation using tools like Terraform, Kubernetes operators, and custom tooling to manage large-scale distributed systemsEstablish and track Service Level Objectives (SLOs) and Error Budgets to balance feature velocity with system reliabilityPartner with software engineering teams to improve system architecture for reliability, implementing patterns like circuit breakers, graceful degradation, and chaos engineeringDevelop capacity planning models and performance testing frameworks to ensure systems can handle growth and peak operational demandsCreate runbooks, documentation, and training materials to enable teams to operate production systems effectivelyLead cross-functional efforts to improve deployment safety through progressive rollouts, automated testing, and rollback capabilitiesImplement security best practices and compliance controls for production environments handling sensitive defense dataBuild tooling and automation to reduce toil and improve operational efficiency for the engineering organizationParticipate in on-call rotations and serve as an escalation point for critical production incidentsREQUIRED QUALIFICATIONS7+ years of engineering experience with at least 3+ years focused on SRE, production operations, or infrastructure engineeringBachelor's degree in Computer Science, Engineering, or equivalent practical experienceDeep expertise with Kubernetes in production environments, including operational challenges at scale (100+ nodes)Strong programming skills in one or more languages such as Go, Python, Rust, or Java with ability to build production-grade toolingProven experience designing and implementing observability stacks (metrics, logging, tracing) using tools like Prometheus, Grafana, ELK/EFK, or equivalentHands-on experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code practicesDemonstrated ability to debug complex distributed systems issues across multiple layers of the stackTrack record of improving system reliability through architectural changes, not just operational band-aidsStrong incident management and communication skills, with experience leading responses to critical outagesMust be a U.S. Person due to required access to U.S. export controlled information or facilitiesEligible to obtain and maintain an active U.S. Secret security clearancePREFERRED QUALIFICATIONSExperience with defense, aerospace, or other mission-critical systems where downtime has severe consequencesExpertise in performance optimization and capacity planning for high-throughput, low-latency systemsKnowledge of chaos engineering principles and experience implementing resilience testing frameworksExperience with service mesh technologies (Istio, Linkerd) and advanced traffic management patternsBackground in database operations and optimization (PostgreSQL, Cassandra, or similar at scale)Familiarity with CI/CD platforms and deployment automation (ArgoCD, FluxCD, Spinnaker, Jenkins)Understanding of networking fundamentals including load balancing, DNS, TLS/SSL, and network securityExperience with configuration management and secrets management solutions (Vault, Sealed Secrets, SOPS)Strong written and verbal communication skills with ability to explain technical concepts to non-technical stakeholdersActive Secret or higher security clearanceUS Salary Range$166,000-$220,000 USDThe salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril offers top-tier benefits for full-time employees, including:Healthcare BenefitsUS Roles: Comprehensive medical, dental, and vision plans at little to no cost to you.UK & AUS Roles: We cover full cost of medical insurance premiums for you and your dependents.IE Roles: We offer an annual contribution toward your private health insurance for you and your dependents.Additional BenefitsIncome Protection: Anduril covers life and disability insurance for all employees.Generous time off: Highly competitive PTO plans with a holiday hiatus in December. Caregiver & Wellness Leave is available to care for family members, bond with a new baby, or address your own medical needs.Family Planning & Parenting Support: Coverage for fertility treatments (e.g., IVF, preservation), adoption, and gestational carriers, along with resources to support you and your partner from planning to parenting.Mental Health Resources: Access free mental health resources 24/7, including therapy and life coaching. Additional work-life services, such as legal and financial support, are also available.Professional Development: Annual reimbursement for professional developmentCommuter Benefits: Company-funded commuter benefits based on your region.Relocation Assistance: Available depending on role eligibility.Retirement Savings PlanUS Roles: Traditional 401(k), Roth, and after-tax (mega backdoor Roth) options.UK & IE Roles: Pension plan with employer match.AUS Roles: Superannuation plan.The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.To view Anduril's candidate data privacy policy, please visit https://anduril.com/applicant-privacy-notice/.
604 matching similar jobs near Seattle, WA
- Enterprise Architect Manager/Senior Manager, Banking & Cap Mkts
- AI Solutions Strategist (Palantir Foundry + AIP)
- BAULI4-Computing Architect 4 - B78-Information/Data Architecture
- Engineering Manager, Data Science, SDK & Cloud Engineering
- AI Software Engineer, Forward Deployed
- Field Application Engineer - Datacenter (Seattle)
- Software Development Manager, AWS Marketing Demand Services
- Senior Hardware Development Manager, AWS Accelerator Servers
- Manager, Software Development (SDM) , Amazon Managed Service for Apache Flink
- Senior Software Development Manager, Engagement Growth
- Network Development Engineer II, Capacity Engineering
- Software Engineering Manager, AWS CloudFormation
- Sr. Product Manager Technical, AWS Startups
- Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality
- Manager III, Software Dev - AMZ9676874
- Automation Solutions Engineer, Central RME, Global Jam Program
- Solutions Architect III - AMZ9725547
- Software Development Manager - Core distro, Amazon Linux
- Sr. PMT-ES, AWS Transform
- Senior Specialist Solutions Architect, Amazon Connect T&C Development, Applied AI Acceleration Solutions
- Software Development Engineer, AWS SDE Centralized Team
- System Development Engineer II, Prime Air
- Systems Development Engineer, Manufacturing Test Infrastructure, Project Kuiper
- Software Engineering Manager, AWS EKS
- Network Development Engineer II, Capacity Engineering
- Principal PMT - External Services, AWS Identity and Access Management
- Senior Hardware Development Engineer AWS AI & ML, Accelerator Servers
- Controls Design Engineer, Controls Design Engineering
- Manager III, Software Dev - AMZ9676874
- Controls Design Engineer, Controls
- Hardware Dev Engr II - AMZ9674068
- Software Development Engineer, Amazon Connect Contact Lens
- Sr System Development Manager , Amazon Leo Enterprise Engineering & Technology
- SAP ABAP and BTP Architect, AWS SAP
- Software Dev Engineer, Developer Agents and Experiences
- Principal, Fin. Transformation, Finance and Business Integration
- Program Manager, AWS Services Finance
- Software Development Engineer, Management Plane
- Manager, Enterprise Security Engineering
- Senior Software Engineer - Forge Factory Automation