JOBSEARCHER

SRE Lead | Diversified Strategies Hedge Fund

[Up to c. $500k Comp Package | Hybrid Working - 3 Days in Office]Role OverviewWe’re representing a global multi-strategy investment firm seeking an SRE Lead to take ownership of reliability engineering across a business-critical technology estate. This role will lead a distributed team across New York and London, improving production stability, observability, operational discipline and reliability standards across demanding front-office and firmwide platforms.This is a hands-on technical leadership role, not a purely managerial position. The team is experienced, but the next phase requires someone who can bring structure, cohesion and strategic direction - moving the function from a DevOps-leaning model towards a more mature SRE discipline. You’ll need the technical gravitas to command respect from senior engineers, while working constructively with demanding business stakeholders to deliver a high-quality service. Longer term, this is a strong progression opportunity for someone capable of growing into broader platform engineering leadership....Key ResponsibilitiesBring structure to planning, prioritisation, delivery tracking and ownership across the teamEstablish consistent SRE standards across monitoring, incident response, operational readiness and service ownershipImprove observability, alert quality, routing, metrics and performance visibility across the environmentMove the team towards a more proactive reliability model, reducing repeat issues and reactive supportPartner closely with business users, platform teams and engineering groups to improve service quality and resilienceLead improvements across Kubernetes operations, including reliability, upgrades, capacity, networking and workload stabilityOwn reliability practices around critical distributed systems, including Kafka or similar messaging platformsStrengthen automation, CI/CD and GitOps practices using Terraform, Ansible, GitLab and ArgoCDDrive technical debt reduction and ensure recurring issues are addressed with durable fixesParticipate in on-call as a senior escalation point for high-severity production incidentsTrack utilisation, cost and vendor performance across relevant SRE-owned servicesWhat You’ll Bring…8-15 years’ experience across SRE, production engineering, platform reliability or infrastructure engineeringProven experience leading senior engineers, either as a formal manager or technical leadStrong technical credibility, with the ability to operate at or above the level of an experienced SRE teamDeep hands-on Kubernetes expertise across production operations, troubleshooting, upgrades, networking, RBAC, capacity and workload reliabilityStrong automation and Infrastructure-as-Code experience using Terraform, Ansible or similarPractical coding ability, ideally in Python, for tooling, automation and workflow improvementStrong observability background, including monitoring standards, alert quality and incident response processesExperience operating distributed systems, ideally Kafka or similar streaming/messaging platformsFamiliarity with CI/CD and GitOps workflows, ideally with GitLab, ArgoCD or comparable toolingExperience across hybrid infrastructure environments, with AWS or similar public cloud exposureStrong Linux systems knowledge and broader infrastructure troubleshooting capabilityOpinionated technical judgement, balanced with the ability to bring others along constructivelyService-oriented mindset, with the ability to support demanding business needs while improving long-term platform quality(Preferred) Experience with multi-region or multi-cluster reliability patterns, disaster recovery testing, or continuous service validation(Preferred) Background in financial services, trading, large-scale SaaS or other production-critical environments...