Sr. Site Reliability Engineer (Compute Platform)

3B StaffingDallas, TXMay 23rd, 2026

Computer Systems Engineers/ArchitectsComputer Systems Design and Related Services

Job Title: Sr. Site Reliability Engineer (Compute Platform) Visa: USC onlySalary: $60/Hr.Job Type: Contract, with potential for Contract-to-hire - the client only wants to see candidates that are willing to convert to full-time employment for this role that do not require any type of sponsorship.Worksite Requirement: Fully RemoteInterview Process: 2-3 rounds of video conference interviewsJob Summary:We are seeking a highly experienced Sr Site Reliability Engineer - Compute Platforms to design, implement, and support Kubernetes on baremetal and hypervisor platforms in a private cloud environment. This role is responsible for the architecture, design, andstandardization of enterprise compute and hypervisor environments spanning bare metal infrastructure, operating systems, hypervisors, private cloud orchestration, and Kubernetes using Infrastructure-as-Code and GitOps practices.This is a deeply technical role requiring expert-level understanding of compute hardware management, Kubernetes, OpenStack, hypervisors and extensive working knowledge on Linux Operating systems. You will also collaborate with platform and SRE teams to maintain secure, performant, and multi-tenant-isolated services that serve high-throughput, mission critical applications.Key ResponsibilitiesLead the architecture and design of enterprise compute and hypervisor platform solutions across hardware, OS, virtualization, cloud orchestration, and container orchestration layersDefine standards and automation frameworks for bare metal provisioning and lifecycle managementDesign and implement Bare Metal as a Service (BMaaS) capabilities for scalable infrastructure consumptionArchitect and design Kubernetes platforms on bare metal with QoS and Affinity (ArgoCD)Architect and validate automated deployments of operating systems and hypervisors including Ubuntu and HarvesterDesign and maintain PXE-based provisioning environments leveraging Redfish APIs for large-scale server deploymentsDevelop Infrastructure-as-Code using Ansible, Terraform, Helm and Git, with Python/Bash automation.Implement CI/CD pipelines for infrastructure updates, patching, upgrades, testing, and rollback.Design automated workflows for server build, firmware lifecycle management, patching, and hardware validationEvaluate and standardize enterprise hardware platforms to meet performance, scalability, and reliability requirementsProduce detailed high-level and low-level design documentation, build guides, and operational handoff materialsPerform deep troubleshooting across storage, Kubernetes, hypervisors, networking, and Linux systemsPartner with operations, network, storage, and platform teams to ensure designs are supportable and production-readyParticipate in on-call escalation support for complex platform-related issuesCollaborate globally on change management, documentation, and operational best practices. Must Have:8+ years of experience in infrastructure engineering, platform engineering, or DevOps with a strong focus on Compute system designProven experience designing and automating bare metal compute environments at scaleStrong hands-on experience with PXE boot, network-based OS provisioning, and automated server imagingExperience implementing or supporting Bare Metal as a Service (BMaaS) platformsPractical experience using Redfish APIs for hardware provisioning, power management, and remote lifecycle operationsDeep expertise with Ubuntu Linux in enterprise environmentsStrong Hands-on experience with KVM hypervisors (Suse Harvester, OpenStack).Experience designing and deploying production-grade Kubernetes clustersStrong background with enterprise compute hardware platforms, including Cisco UCS, Dell PowerEdge, Supermicro systems & HPEProficiency with Infrastructure as Code tools (e.g., Terraform, Ansible, or similar)Experience building or supporting CI/CD pipelines for infrastructure and platform automationStrong scripting skills in Python, Bash, or similar languagesOpenStack, Ubuntu KVM administration.BareMetal as a Service (PXE, Redfish).Kubernetes on BareMetalCIS/NIST security and infrastructure lifecycle management.ITIL Foundation/advanced certifications in support of ITSM standard methodology.Background in telco, edge cloud, or large enterprise environments.Ubuntu Certifications, CNCF Certified Kubernetes Administrator (CKA), Certified Kubernetes Security Specialist (CKS)Master's degree in computer science, IT, Engineering, or a related field preferred; equivalent experience and relevant industry certifications will also be considered.

Sr. Site Reliability Engineer (Compute Platform)

matching similar jobs near Dallas, TX