Senior Platform Engineer
Occupations:
Computer Systems Engineers/ArchitectsSoftware DevelopersNetwork and Computer Systems AdministratorsComputer Occupations, All OtherComputer Systems AnalystsIndustries:
Software PublishersContinuing Care Retirement Communities and Assisted Living Facilities for the ElderlyShoe RetailersComputer Systems Design and Related ServicesVocational Rehabilitation ServicesTitle: Senior or Staff Platform EngineerLocation: FULLY remote! Salary: $175k-$275k base + RSUs + Full Benefits Requirements: 3+ years in Systems Engineering or HPC Infrastructure, strong Linux and bare-metal GPU experience, NVIDIA DGX/HGX, InfiniBand/RoCE, and automation with Python or GoWe build the high-performance, bare-metal GPU infrastructure that powers modern AI. Our team designs and operates large-scale NVIDIA DGX/HGX clusters, high-speed networking, and the automation that turns complex hardware into a reliable, production-ready platform. We work directly with the metal: provisioning nodes, tuning Linux, integrating InfiniBand/RoCE, and building the tooling that enables fast, secure, and scalable AI workloads.If you want to help shape the systems that make large-scale AI possible, this is where you will do it. We are looking for a Senior or Staff-level Platform Engineer to architect and operate the high-performance GPU infrastructure that powers next-generation AI systems. This is not a traditional cloud role - you will own the full lifecycle of bare-metal GPU clusters, from "empty rack" to production-grade Kubernetes, and build the automation that makes large-scale AI infrastructure reliable, observable, and secure.If you thrive at the intersection of hardware, distributed systems, and automation - and you love solving the problems that live between teams - you will feel right at home here.What You'll be DoingDesign and operate container orchestration platforms optimized for NVIDIA DGX/HGX-class hardware.Build bare-metal provisioning systems (PXE, Ironic, MAAS) to bring GPU clusters online at scale.Manage GPU lifecycle: driver stacks, CUDA/kernel compatibility, MIG slicing, and performance tuning.Partner with Network Engineering and DCOps to align physical infrastructure with software orchestration.Build automation and internal tooling in Go or Python to streamline cluster operations.Implement Terraform/Ansible-based IaC for fully auditable, repeatable infrastructure.Design high-resolution observability stacks (Prometheus/Grafana, DCGM, VictoriaMetrics).Participate in a specialized on-call rotation supporting GPU workloads and core platform services.What You Need for this Position7+ years in systems, platform, or distributed systems engineering (10+ for Staff).Expert-level Linux knowledge: kernel modules, sysctl tuning, hugepages, container runtimes.Hands-on experience bootstrapping Kubernetes or SLURM on physical hardware.Strong proficiency in Go (preferred) or Python for systems-level automation.Deep familiarity with NVIDIA GPU ecosystems (drivers, CUDA, MIG).Working knowledge of InfiniBand or RoCEv2 networking and NCCL performance tuning.Experience building observability pipelines for hardware-accelerated environments.Ability to troubleshoot complex, multi-layered issues across hardware, networking, and orchestration.Strong cross-team communication - you're the "glue" between Network, DCOps, and Software.Bonus PointsExperience with SLURM, Kubeflow, or distributed PyTorch.Integrating vendor APIs (NetBox, Vault, GitLab CI, etc.) into unified workflows.Infrastructure testing, chaos engineering, or cluster-level integration test suites.Designing telemetry aggregation across hardware, networking, and environmental systems.What's In It for You$175k - $275k/year DOERSU's5 weeks PTO401k w/ matchComprehensive Benefit Plan Email Your Resume In Word ToAbi.Harper@CyberCoders.comLooking forward to receiving your resume through our website and going over the position with you. Clicking apply is the best way to apply.Please do NOT change the email subject line in any way. You must keep the JobID: linkedin : AH12-1987058 -- in the email subject line for your application to be considered.Abi Harper - Lead RecruiterFor this position, you must be currently authorized to work in the United States without the need for sponsorship for a non-immigrant visa. This is a new role.This job was first posted by CyberCoders on 05/14/2026 and applications will be accepted on an ongoing basis until the position is filled or closed.This job was posted on 05/14/2026 and is open for 60 daysEverforth CyberCoders is proud to be an Equal Opportunity Employer All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, sexual orientation, gender identity or expression, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, status as a crime victim, disability, protected veteran status, or any other characteristic protected by law. Our hiring process includes AI screening for keywords and minimum qualifications. Recruiters review all results. Everforth CyberCoders will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable state and local law, including but not limited to the Los Angeles County Fair Chance Ordinance, the San Francisco Fair Chance Ordinance, and the California Fair Chance Act. Everforth CyberCoders is committed to working with and providing reasonable accommodation to individuals with physical and mental disabilities. Individuals needing special assistance or an accommodation while seeking employment can contact a member of our Human Resources team at Benefits@CyberCoders.com to make arrangements.Copyright © 2026 Everforth, Inc. All rights reserved.