Site Reliability Engineer
The Role:
We’re looking for an experienced Site Reliability Engineer (SRE) to take ownership of our production systems’ availability, latency, performance, and capacity. In this role, you’ll apply your expertise in automation, monitoring, and resilient system design to maintain and improve our critical, large-scale infrastructure.
You will:
Respond to customer support requests and participate in our 24/7 support rotation
Maintain internal documentation and deployment playbooks
Modify and test server configurations, then deploy to production
Monitor infrastructure and respond to alerts
Automate tasks using tools like Ansible, Terraform, and Nomad
Contribute to internal tooling and platform improvements
Stay current with changes in the protocols and tooling we support
Other duties as assigned
Although the focus of this role is SRE work, there’s room to grow into other areas depending on your strengths — whether that’s platform engineering, networking, or data system architecture.
Requirements:
6+ years of relevant work experience in systems or infrastructure roles
Strong experience with Ansible
Experience with Prometheus, Grafana, and related monitoring tools
Solid understanding of networking and Linux-based systems
Hardware knowledge and experience managing physical or cloud-based fleets
Nice to Have:
Knowledge of Kubernetes
Experience with Blockchain is a plus
Familiarity with the HashiCorp stack: Nomad, Consul, Vault
Experience with HAProxy or similar load balancing software
Programming experience in Go, Rust, or Python is a plus
Don’t meet all the “preferred” criteria? Don’t let that stop you! Let us know if your application where you’d still need to get up to speed – the most important thing to us is that you love taking on big challenges, and learning new skills while solving problems.
Compensation Range: $105K - $135K