HPC Operations Engineer
Jump Trading Group is committed to world class research. We empower exceptional talents in Mathematics, Physics, and Computer Science to seek scientific boundaries, push through them, and apply cutting edge research to global financial markets. Our culture is unique. Constant innovation requires fearlessness, creativity, intellectual honesty, and a relentless competitive streak. We believe in winning together and unlocking unique individual talent by incenting collaboration and mutual respect. At Jump, research outcomes drive more than superior risk adjusted returns. We design, develop, and deploy technologies that change our world, fund start-ups across industries, and partner with leading global research organizations and universities to solve problems.We are looking for an adaptable hands-on individual, passionate about the details and nuances of managing Linux HPC environments at scale, and eager to tackle complex and unpredictable operational work as their primary job function.What You'll Do:Provide front-line operational support for 24/7 Linux HPC compute, storage, and interconnects. Technologies involved include RDMA fabrics, parallel filesystems, HPC batch schedulers, FUSE filesystems, internal Jump software, multi-vendor hardware, cybersecurity requirements, a challenging and unpredictable client workload, and high user expectationsSolve problem reports and questions posed by members of Jump's research community, escalating as needed and managing the entire problem lifecycleRespond to alerts in a timely fashionParticipate in large, coordinated maintenance operations, including during evenings and weekendsWork on global projects across a wide range of infrastructureWrite code for diagnosing, resolving, and triaging difficult problems and automating frequently performed tasksCollaborate with team members and across teams to write code and testing infrastructures spanning both new and existing codebases in multiple programming languagesManage relationships with outside vendors, including traveling both domestically and internationally to meet with current and potential vendorsImplement and support performance monitoring and fault monitoring systemsDevelop and improve systems and user documentationDevelop and monitor the tools used to maintain a production computing environmentProvide operational support as primary job functionAdhere to all company cybersecurity and IT policies, including performing all work using only approved hardware and softwareParticipate in an on-call rotationOther duties as assigned or neededWork from company office an average of 5 days a weekMust be willing to work a maintenance window of either Friday evening or Saturday morningSkills You'll Need:A desire for operational work as primary job function2+ years of professional experience with Linux systems2+ years professional experience working with High performance computing (HPC), including parallel filesystems (e.g., Lustre, GPFS), batch systems (e.g., Slurm, Grid Engine), and high-performance network interconnects experience is a plus, but not requiredHigh proficiency with at least one programming/scripting language (e.g., Go, Python, C) and ability to learn additional languages quicklyAbility to perform root cause analysisStrong verbal and written communication skills, including the ability to communicate effectively and efficiently with both coworkers and third-party vendorsStrong collaboration skills with a willingness to undertake tasks of various technologies and complexitiesAbility to independently manage complex projects and multiple workstreamsStrong sense of urgencyWillingness to perform regular operational maintenance work during evenings and weekends and as neededAbility to work effectively in a busy, open floor plan office environmentReliable and predictable availabilityThis role will sit onsite in Chicago, IL.Benefits- Discretionary bonus eligibility- Medical, dental, and vision insurance- HSA, FSA, and Dependent Care options- Employer Paid Group Term Life and AD&D Insurance- Voluntary Life & AD&D insurance- Paid vacation plus paid holidays- Retirement plan with employer match- Paid parental leave- Wellness ProgramsAnnual Base Salary Range$150,000—$175,000 USD