JOBSEARCHER

Platform Engineer SRE - TX, NC, AZ

Apex SystemsPhoenix, AZApril 29th, 2026
Systems Operations Engineer 4 Harness CD / SRE Client: Financial Services Team: Platform Engineering / SRE (Harness CD) Location: Zone 2 Approved Sites: USA?TX?IRVING 401 Las Colinas Blvd W, Bldg A?111432 2222 W Rose Garden Ln, Phoenix, AZ 85027 Charlotte, NC 28202 Contract Length: 12 Months (Conversion to FTE after 12 months) Work Model: Hybrid (RTO 3 Days Onsite) Pay Rate: $61 - $65 Top Requirements: 57+ years in DevOps / SRE / Platform / Cloud Engineering Hands-on experience with Harness CD (enterprise operations and integrations) Strong experience with Kubernetes / OpenShift, Linux, cloud services, and deployment best practices Solid understanding of CI/CD workflows and release automation SRE concepts: SLIs/SLOs, error budgets, incident response, operational maturity improvements Automation & IaC: Python/Bash/PowerShell and Terraform, Ansible, Helm Observability: Prometheus, Grafana, Splunk/ELK, AppDynamics (dashboards, alerts, RCA) Plusses (Preferred Qualifications): Operating CD platforms at enterprise scale (hundreds of teams, multi-region) Experience in Azure and/or GCP, hybrid cloud DevSecOps controls, policy enforcement, governance pipelines Experience with platform upgrades, migrations, and modernization projects Proven contributions to BCP validation, backup verification, resiliency improvements Job Summary: The Systems Operations Engineer 4 serves as the Harness CD platform SRE/Owner, responsible for end-to-end reliability, performance, and modernization across non-prod, prod, and BCP environments. The role drives automation-first operations, implements observability and alerting, integrates with CI/CD ecosystems (GitHub, Jenkins, Azure DevOps, Kubernetes/OpenShift, cloud providers), and partners with Security to embed DevSecOps controls. This engineer leads incidents and RCAs, manages SLIs/SLOs/error budgets, and continuously improves scalability, resiliency, and developer experience through hardened pipelines and self-service workflows. Day-to-Day Responsibilities: Platform Ownership & Reliability (SRE) Operate the Harness CD platform across non-prod, prod, and BCP; maintain SLIs, SLOs, error budgets, success rates, platform health Lead incident response, troubleshooting, and RCA for deployment failures, delegate outages, or performance issues Identify/remediate scaling & capacity constraints across delegates, pipelines, clusters, and cloud integrations Automation & Engineering Excellence Build automation for provisioning, configuration, scaling, upgrades, and maintenance of Harness components Implement IaC using Terraform, Ansible, Helm; automate delegate lifecycle, cluster onboarding, secret rotation, and pipeline validation Reduce toil via resilient, repeatable, self-service workflows DevOps & CI/CD Integration Maintain/enhance integrations with GitHub, Jenkins, Azure DevOps, Kubernetes/OpenShift, and cloud providers Optimize deployment strategies (blue/green, canary, rolling) for speed and reliability Embed DevSecOps controls (policy enforcement, governance pipelines, security checks) Observability & Monitoring Implement monitoring, logging, dashboards, and alerting for all Harness components Use Splunk, Prometheus, Grafana, AppDynamics to deliver actionable alerts and reduce MTTD/MTTR Detect/escalate issues (delegate saturation, pipeline slowdowns, API failures, K8s resource constraints) Modernization & Continuous Improvement Execute upgrades, hotfixes, patching; evaluate new Harness features & modules Drive containerization, cloud-native deployments, multi-cloud expansion Support BCP readiness and resiliency validation Technical Leadership Act as SME for Harness platform operations; produce architecture docs, runbooks, and standards Mentor and partner with senior engineers to improve patterns and operational excellence Required Qualifications: 57+ years in DevOps, SRE, Platform, or Cloud Engineering Hands-on Harness CD experience Strong Kubernetes/OpenShift, Linux, cloud services, deployment best practices Solid grasp of CI/CD workflows and release automation SRE practices (SLIs/SLOs, error budgets) and operational maturity Automation/scripting (Python, Bash, PowerShell) IaC with Terraform, Ansible, Helm (or equivalent) Observability tools (Prometheus, Grafana, Splunk/ELK, AppDynamics) and full-stack troubleshooting Job Expectations: Hybrid work schedule (3 days onsite at an approved location) On-call support as needed; flexibility for ad-hoc shifts 12-month contract with conversion to FTE target after 12 months