{"schemaVersion":"jobsearcher.job.v1","id":"3c5a7ce3e155f3e173009fa0","url":"https://jobsearcher.com/jobs/3c5a7ce3e155f3e173009fa0","canonicalUrl":"https://jobsearcher.com/jobs/3c5a7ce3e155f3e173009fa0","title":"Cloud DevOps Engineer","description":"Cloud Engineer – Observability and SRE (Grade 10)Bay Area CA- onsite roleMax pay rate: $65/hr w2 + benefits7 month initial durationPosition SummaryThe Grade 10 Cloud Engineer within the Customer’s Cloud Collaboration Technology Group will play a key role in building and operating scalable observability and infrastructure platforms supporting Webex microservices. This role requires strong hands-on expertise in Kubernetes, cloud infrastructure, and observability systems, along with the ability to operate independently and to own components end-to-end in production environments. Candidates will demonstrate extensive use of generative AI tools for code generation and production system troubleshooting.Key Responsibilities• Design, develop, and operate observability platforms – to perform logging, metrics, and/or tracing – for Webex microservices.• Manage and optimize Kubernetes clusters across multi-region environments.• Own CI/CD pipelines using Argo CD and Helm.• Implement Infrastructure as code (IaC) using Terraform on AWS.• Operate monitoring ecosystems, including but not limited to:o OpenSearch/ELK,o Prometheus,o Grafana,o Splunk, ando Kafka.• Build automation to detect and remediate production issues.• Ensure security compliance through vulnerability patching.• Collaborate cross-functionally to improve reliability.• Participate in on-call rotations and incident response.• Contribute to distributed system design and operations.Required SkillsGeneral Abilities• Bachelor’s degree in computer science or related fieldGeneral Technical Skills• At least eight (8) years of experience in a DevOps and/or SRE platform engineering role• Incident response and on-call operations: Demonstrated experience in a 24/7 production environment, including but not limited to:o Triaging alertso Leading incident responseo Writing post-incident reviewso Maintaining SLA commitments across large-scale distributed systems• IaC and automation: Proficiency with Terraform, Ansible, and/or equivalent IaC tooling for provisioning and managing cloud infrastructure at scale on AWS• Scripting and development: Working proficiency in Python, Golang, and/or Bash for building automation scripts, operational tooling, and/or CI/CD pipeline integrations (e.g., Drone, GitHub Actions, Argo CD)Specific Technical Skills• Kubernetes and container orchestration: Production experience operating and troubleshooting workloads on Kubernetes at large scale (i.e., hundreds of deployments and thousands of pods), including but not limited to:o Helm chart managemento Pod schedulingo Resource tuningo Multi-cluster operations• Observability stack expertise: Hands-on experience – performing pipeline design, query optimization, and/or capacity planning for high-volume environments – in at least two (2) of the following:o OpenSearch/Elasticsearcho Prometheus/Mimiro Grafanao Lokio Splunko LogstashDesired Skills• Apache Kafka/AWS MSK: Experience in at least one (1) of the following:o Operating or tuning Kafka clusters at scaleo Managing the following across high-throughput streaming pipelines: Topic configurations, ACLs, Consumer lag, and/or Schema registries• Splunk administration: Experience deploying, managing, and/or migrating Splunk Enterprise environments with Kubernetes-based log shipping architectures, including but not limited to:o Forwarder management,o Search optimization,o Index lifecycle, and/oro Integration• OpenTelemetry and distributed tracing: Experience with deploying OpenTelemetry for data collection and application performance monitoring• Security frameworks and container hardening: Familiarity with at least one (1) of the following (for vulnerability remediation at scale):o Government or industry security certification standards; examples: FedRAMP STIG IL5 ISO 27001 SOC 2o Container image hardening practiceso Security scanning tools (e.g., Anchore, Grype)• AI-augmented operations: Experience using LLMs, AI coding assistants, and/or custom AI agents (e.g., MCP servers, Copilot, Claude) to:o Accelerate engineering workflows,o Automate runbooks, and/oro Assist with incident triage• Deployment pipelines (Argo CD/Helm bundles): Experience with at least one (1) of the following across multi-region clusters:o GitOps-style deployment workflowso Argo CD application managemento Helm bundle patternso Blue/green or canary release strategies• Cost optimization and capacity planning: Experience in at least one (1) of the following in large-scale logging and/or metrics platforms:o Right-sizing cloud resourceso Analyzing spending across AWS serviceso Optimizing data retention policies (ISM/ILM)o Reducing storage costs","company":"Pinnacle Group","rawCompany":"pinnacle group","city":"Hayward","state":"CA","isRemote":false,"isActive":false,"createdAt":"2026-05-10T03:59:28.496Z","occupations":[{"code":"15-1299.08","title":"Computer Systems Engineers/Architects","slug":"computer-systems-engineers-architects"},{"code":"15-1252.00","title":"Software Developers","slug":"software-developers"},{"code":"15-1244.00","title":"Network and Computer Systems Administrators","slug":"network-and-computer-systems-administrators"}],"industries":[{"code":"541512","title":"Computer Systems Design Services","slug":"computer-systems-design-services"},{"code":"513210","title":"Software Publishers","slug":"software-publishers"},{"code":"541511","title":"Custom Computer Programming Services","slug":"custom-computer-programming-services"}],"jobPosting":{"@context":"https://schema.org","@type":"JobPosting","title":"Cloud DevOps Engineer","description":"Cloud Engineer – Observability and SRE (Grade 10)Bay Area CA- onsite roleMax pay rate: $65/hr w2 + benefits7 month initial durationPosition SummaryThe Grade 10 Cloud Engineer within the Customer’s Cloud Collaboration Technology Group will play a key role in building and operating scalable observability and infrastructure platforms supporting Webex microservices. This role requires strong hands-on expertise in Kubernetes, cloud infrastructure, and observability systems, along with the ability to operate independently and to own components end-to-end in production environments. Candidates will demonstrate extensive use of generative AI tools for code generation and production system troubleshooting.Key Responsibilities• Design, develop, and operate observability platforms – to perform logging, metrics, and/or tracing – for Webex microservices.• Manage and optimize Kubernetes clusters across multi-region environments.• Own CI/CD pipelines using Argo CD and Helm.• Implement Infrastructure as code (IaC) using Terraform on AWS.• Operate monitoring ecosystems, including but not limited to:o OpenSearch/ELK,o Prometheus,o Grafana,o Splunk, ando Kafka.• Build automation to detect and remediate production issues.• Ensure security compliance through vulnerability patching.• Collaborate cross-functionally to improve reliability.• Participate in on-call rotations and incident response.• Contribute to distributed system design and operations.Required SkillsGeneral Abilities• Bachelor’s degree in computer science or related fieldGeneral Technical Skills• At least eight (8) years of experience in a DevOps and/or SRE platform engineering role• Incident response and on-call operations: Demonstrated experience in a 24/7 production environment, including but not limited to:o Triaging alertso Leading incident responseo Writing post-incident reviewso Maintaining SLA commitments across large-scale distributed systems• IaC and automation: Proficiency with Terraform, Ansible, and/or equivalent IaC tooling for provisioning and managing cloud infrastructure at scale on AWS• Scripting and development: Working proficiency in Python, Golang, and/or Bash for building automation scripts, operational tooling, and/or CI/CD pipeline integrations (e.g., Drone, GitHub Actions, Argo CD)Specific Technical Skills• Kubernetes and container orchestration: Production experience operating and troubleshooting workloads on Kubernetes at large scale (i.e., hundreds of deployments and thousands of pods), including but not limited to:o Helm chart managemento Pod schedulingo Resource tuningo Multi-cluster operations• Observability stack expertise: Hands-on experience – performing pipeline design, query optimization, and/or capacity planning for high-volume environments – in at least two (2) of the following:o OpenSearch/Elasticsearcho Prometheus/Mimiro Grafanao Lokio Splunko LogstashDesired Skills• Apache Kafka/AWS MSK: Experience in at least one (1) of the following:o Operating or tuning Kafka clusters at scaleo Managing the following across high-throughput streaming pipelines: Topic configurations, ACLs, Consumer lag, and/or Schema registries• Splunk administration: Experience deploying, managing, and/or migrating Splunk Enterprise environments with Kubernetes-based log shipping architectures, including but not limited to:o Forwarder management,o Search optimization,o Index lifecycle, and/oro Integration• OpenTelemetry and distributed tracing: Experience with deploying OpenTelemetry for data collection and application performance monitoring• Security frameworks and container hardening: Familiarity with at least one (1) of the following (for vulnerability remediation at scale):o Government or industry security certification standards; examples: FedRAMP STIG IL5 ISO 27001 SOC 2o Container image hardening practiceso Security scanning tools (e.g., Anchore, Grype)• AI-augmented operations: Experience using LLMs, AI coding assistants, and/or custom AI agents (e.g., MCP servers, Copilot, Claude) to:o Accelerate engineering workflows,o Automate runbooks, and/oro Assist with incident triage• Deployment pipelines (Argo CD/Helm bundles): Experience with at least one (1) of the following across multi-region clusters:o GitOps-style deployment workflowso Argo CD application managemento Helm bundle patternso Blue/green or canary release strategies• Cost optimization and capacity planning: Experience in at least one (1) of the following in large-scale logging and/or metrics platforms:o Right-sizing cloud resourceso Analyzing spending across AWS serviceso Optimizing data retention policies (ISM/ILM)o Reducing storage costs","datePosted":"2026-05-10T03:59:28.496Z","dateModified":"2026-05-10T03:59:28.496Z","hiringOrganization":{"@type":"Organization","name":"Pinnacle Group","sameAs":"https://jobsearcher.com"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hayward","addressRegion":"CA","addressCountry":"US"}},"identifier":{"@type":"PropertyValue","name":"JobSearcher","value":"3c5a7ce3e155f3e173009fa0"},"url":"https://jobsearcher.com/jobs/3c5a7ce3e155f3e173009fa0"}}