Graphics Processing Unit (GPU) Engineer with Security Clearance
Job Description Base-2 Solutions is seeking a highly skilled Systems Engineer with deep expertise in operating systems, hardware, GPU, and high-speed networking. In this role, you will design, develop, and optimize GPU clusters that power enterprise AI for the mission customers. Primary Responsibilities * GPU Cluster Engineering : Design, configure, and maintain GPU Clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements. * Operating System Integration : Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates. * Performance Optimization : Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers. * Tooling and Automation : Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt. * Compliance & Documentation : Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards. Qualifications * Bachelor's or higher degree in Computer Science, Electrical Engineering, or a related field. * Additional years of experience may be considered in lieu of a degree. * 10+ years of relevant systems engineering experience. * Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s). * Knowledge of enterprise server components (storage/network controllers, HBA, SSDs). * Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky). * Excellent problem-solving skills and the ability to collaborate within a team. * Candidate must, at a minimum, meet DoD 8570.11 - IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP). Clearance * TS/SCI clearance with Polygraph required or a TS/SCI and willingness to obtain a Polygraph prior to starting. Preferred Qualifications * Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow). * Familiarity with GPU virtualization and cloud computing. * Experience with Prometheus/Grafana for monitoring. * Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.) Pay & Benefit HighlightsCompensation * Competitive fixed salary or hourly pay (based on experience, skills, location, and internal equity). * Employee referral bonuses up to $10,000 per hired referral. * Additional bonus opportunities for exceptional performance and contributions to business development and company growth (role-dependent). Health * 100% company-paid medical premiums for employees and eligible dependents. * Choose from multiple plan options with CareFirst, Kaiser, and UnitedHealthcare, including PPO, POS, HMO, and HSA-compatible plans. * 100% company-paid dental premiums for employees and eligible dependents. * 100% company-paid vision premiums for employees and eligible dependents. Income Protection * 100% company-paid premiums for short-term disability. * 100% company-paid premiums for long-term disability. * 100% company-paid premiums for accidental death & dismemberment (AD&D). * 100% company-paid premiums for life insurance up to $200,000. Retirement * 401(k) with immediate vesting: 4% company match plus a 4% non-elective company contribution (8% total). * 401(k) pre-tax and Roth options. Leave * Up to 20 days of flexible paid time off (PTO). * 11 paid floating holidays. Work-Life Balance * Flexible work schedules, including flex time and compressed work periods (contract and project-dependent).