Senior Systems Engineer
Position Title: Senior Systems Engineer (Linux / HPC Environment)Location Information: Washington, D.CPosition Responsibilities:As a Senior Systems Engineer, you will play a critical role in supporting, administering, and maintaining a Linux-based high-performance computing (HPC) environment that underpins advanced analytics, statistical modeling, and research activities. Your primary goal will be to ensure system reliability, security, and top-tier performance while working with cross-functional teams to deliver scalable technical solutions for evolving business needs.Key areas of responsibility include:System Administration:Administer and maintain Linux-based HPC systems.Perform regular system updates, patch management, and robust security hardening.Monitor, tune, and optimize system performance to ensure high availability and efficiency.Platform Support:Provide advanced (Tier 3) technical support for complex HPC platform issues.Troubleshoot and resolve system outages or performance issues with minimal downtime.Interpret business and analytical requirements into workable technical solutions.Collaboration & Communication:Partner closely with data engineers, data scientists, analysts, and various stakeholders to understand and address their technology needs.Document system configurations, processes, troubleshooting steps, and incident resolutions.Drive knowledge sharing and support continuous process improvement activities.Security & Compliance:Implement and maintain security best practices, protocols, and regular audits.Conduct vulnerability assessments to mitigate risks and protect sensitive data.Ensure all systems adhere to organizational and regulatory compliance standards.Project & Engineering Support:Engage in system enhancements, upgrades, and performance initiatives to keep pace with technology advances.Support system architecture and design decisions for both new and existing platforms.Assist with the implementation and integration of new tools, features, and capabilities.On-Call Support:Participate in an on-call rotation to support critical systems and ensure maximum uptime. Essential Skills, ExperienceExtensive hands-on experience in Linux system administration, including scripting with Bash or other shells.Proficiency with automation frameworks such as Ansible (including Ansible Automation Platform) for configuration management and deployment.Background supporting high-performance computing environments, such as systems utilizing SLURM workload manager or Open OnDemand portals.Understanding of analytical and statistical software tools such as Python, R, MATLAB, SAS, or similar platforms.Exceptional troubleshooting and root-cause analysis skills, with the ability to resolve complex technical issues under pressure.Highly effective communication skills, enabling collaboration with both technical specialists and business teams.Commitment to security best practices and experience with vulnerability assessments in enterprise environments.Strong documentation skills with a focus on process consistency and incident management. Qualifications:Bachelor's degree in Computer Science, Information Technology, Engineering, or a relevant technical field (or equivalent experience).Prior experience in system engineering roles within computational, research, or analytics-driven organizations is strongly preferred.U.S. Citizenship is required due to ongoing project needs.Ability to work onsite as required; onsite engagement is full-time unless otherwise specified.Willingness to participate in on-call rotations to ensure system uptime and reliability.