Senior Engineering Manager, Cloud Platform
Job Description: Lead software engineering teams providing infrastructure-as-code to manage cloud infrastructure.Establish governance and mechanisms for application development teams to self-service infrastructure provisioning, while providing for best practices and controls.Provide documentation, training, and support to ensure feature dev teams are leveraging self-service capabilities.Hire experienced site reliability staff, and a line manager to grow and oversee the SRE team.Professionalize incident management. Define and document incident processes and practices for your SRE team and for the application feature teams.Drive incident professionalism across the engineering organization through training and process adoption.Establish design-before-build discipline. Facilitate lightweight design documents, architectural decision records, and working group reviews.Use design reviews, code reviews, and blameless retrospectives to drive a culture of quality and excellence in engineering.Requirements: Demonstrated experience leading teams operating SaaS service infrastructure.Deep hands-on experience deploying and operating production infrastructure on public cloud platforms (AWS strongly preferred; Azure and GCP familiarity a plus).Strong command of Infrastructure as Code, including Terraform; experience with Crossplane and GitOps patterns strongly preferred.Experience managing production Kubernetes environments at scale.Solid understanding of security best practices including zero trust architecture, secrets management, identity and access management, and software supply chain security.Experience building and operating self-service infrastructure platforms that enable application development teams, while balancing self-service and developer productivity with maintainability and security.Experience leading or building SRE functions, including incident management processes, on-call programs, SLO/SLA definition, and operational runbooks.Deep hands-on experience with observability: application performance management, logs and traces, and golden signals and service-specific metrics.Expert in leading infrastructure teams to translate business and product requirements into technical requirements and engineering deliverables.Proven ability to hire, develop, and retain high-performing engineers and engineering managers in remote or distributed environments.Benefits: Health insuranceVision insuranceDental insuranceFlexible vacation policyGenerous parental leave