JOBSEARCHER

Staff SRE Engineer

Staff SRE Reliability EngineerLocation: New York, NY (Hybrid) / RemoteDepartment: EngineeringThe RoleFlowcode is seeking a Staff Site Reliability Engineer (SRE) to lead reliability and infrastructure efforts across our platforms. This role will help grow and drive our infrastructure strategy, operational rigor and observability while building and supporting the systems and tooling required to support Flowcode's continued growth.As a technical leader within our engineering organization, you will grow and operate scalable cloud infrastructure, establish best practices around deployment and reliability, and partner closely with engineering teams to ensure systems are scalable, resilient and observable.This role combines hands-on engineering with systems and architectural leadership. You will be a pivotal member of our engineering leadership team, leading the charge for reliability and long term infrastructure growth.What You'll DoReliability & Infrastructure LeadershipLead Flowcode's site reliability engineering strategy and implementation.Improve system availability, scalability, and resilience across our platformsDrive operational best practices across our engineering teamsCloud & Platform EngineeringMaintain, grow and operate scalable infrastructure on our AWS platformLead infrastructure best practices for scalability, failover, and disaster recoveryWork with critical infrastructure vendors on monitoring, analysis and security.CI/CD & Deployment AutomationBuild and maintain modern deployment and testing pipelinesGrow and maintain our GitOps workflows using ArgoCDEnable safe, reliable releases through automated testing and validationObservability & MonitoringManage monitoring, logging, and alerting systemsImprove system visibility through metrics, tracing, and loggingTechnical LeadershipServe as a reliability and infrastructure subject matter expert across engineeringMentor engineers and promote best practicesCollaborate with our engineering and data team to ensure new systems are built for reliability and scaleQualificationsRequired8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform EngineeringHands-on experience with Kubernetes and container orchestrationExperience building and maintaining CI/CD and deployment pipelinesExperience implementing and growing GitOps workflows and tools such as ArgoCDGithub actions familiarity and exposure, ideally in a multiple contributor production pipelineExperience with observability platforms, code quality tools and common security practicesStrong scripting or programming skills (Python, Go, or similar)Experience supporting high-scale distributed systemsExperience with Infrastructure as Code (Terraform, Pulumi, or CloudFormation)Strong core AWS service familiarity (EKS, EC2, S3, RDS, etc)PreferredExperience designing highly available and multi-region architecturesExperience implementing progressive delivery or deployment strategiesExperience building internal developer platform toolingFlowcode is not for everyone. We hire with a pinhole lens - only those with the rare combination of intellectual horsepower, execution velocity, and uncompromising drive will thrive here. If you are seeking to operate at the highest levels of performance and impact, we want to meet you.How to ApplyWe are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.