<Back to Search
Site Reliability Engineer
Mountain View, CAApril 3rd, 2026
Job Description
Site Reliability EngineerOnsite- Bay Area, CASkillsRelevant Skills and ExperienceWhat You'll Do (Day-to-Day)Own and manage our cloud infrastructure (GCP or AWS, on-prem).Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters).Implement and improve CI/CD pipelines (GitHub Actions).Write and maintain Infrastructure as Code (Terraform).Monitor system health and performance using Grafana and other observability tools.Ensure high availability, reliability, and uptime across platforms.Handle infrastructure maintenance, upgrades, and scaling.Administer and improve our platform architecture and apply general security best practices across the stack.Note: This is an internal-facing role — no customer interaction.Must-Have:4+ years in SRE, DevOps, or Infrastructure EngineeringSolid experience with GCP or AWS (hybrid/on-prem a plus)Experience with Kubernetes cluster management (GPU experience a bonus)Hands-on with Terraform and CI/CD (GitHub)Experience with monitoring/observability (Grafana, etc.)Strong understanding of high availability and infrastructure reliabilityFamiliarity with platform/cluster architecture and administrationSecurity mindset and ability to apply best practiceNice-to-Have:Startup experience (you enjoy building, not just maintaining)Experience with scalable GPU infrastructure for AI/ML
564 matching similar jobs near Mountain View, CA
- IT Automation Engineer
- Salesforce Technical Architect
- Head of Applied Epic Integration & Operations
- Principal Engineer
- Full Stack Product Manager
- Senior Technical Recruiter
- Senior Software Engineer
- Artificial Intelligence Engineer
- OutSystems Architect
- Fully Remote-Guidewire Developer (Open for Visa-OPT/H4EAD also)
- Hexure Firelight