Infrastructure Engineer
ARCHIVED
We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.
Infrastructure Engineer (AI Infrastructure - DevTool Start-Up)$150,000 – $200,000 + Equity + Benefits + PTOOn-site in San Francisco, CAAre you passionate about keeping production AI infrastructure shipping fast, securely, and without breaking? Do you thrive in environments where you directly own the release systems that the world's largest companies depend on to deploy language models in production?This is an opportunity to join a fast-growing, profitable startup at the forefront of AI infrastructure, owning the release engineering layer that powers how a market-leading LLM gateway gets into the hands of customers safely and reliably. Backed by top-tier investors and trusted by major enterprises, this team has built a unified LLM gateway used as a critical proxy by engineering teams worldwide. Now, they're looking for a founding DevOps engineer to own release infrastructure, release security, and the end-to-end process that keeps shipping velocity high without compromising stability.As an early member of the engineering team, you'll take ownership of the systems that move code from commit to production artifact, architecting secure release pipelines, debugging test failures, and making sure that every release that goes out is one the team can stand behind. You'll work directly with senior leadership and ensure that as the platform scales, the release process scales with it.If you're looking for a role where you can combine deep infrastructure ownership with real security responsibility and directly influence how one of the most widely deployed AI tools in the world gets shipped, this is an outstanding opportunity.The RoleOwn secure, regular releases including 2 nightly releases and 1 stable release per weekManage and improve release infrastructure including Helm, Terraform, CI/CD, and the developer systems needed to keep releases stableInvestigate test failures and determine whether they are true regressions, flaky tests, or dead tests that should be fixed or removedWrite Python to fix minor test issues, improve release reliability, and support developer workflowsArchitect and implement a secure release process across build, test, approval, and publish stepsWork closely with the engineering team to improve release quality, reduce operational risk, and keep shipping velocity highThe Person1+ years of experience in infrastructure engineering, distributed systems, release engineering, or related systems workProficient in Python and comfortable making code changes in test and release systemsExperience with Terraform, Helm, CI/CD systems, and cloud infrastructureStrong judgment around release reliability, testing, and debuggingAble to distinguish between real regressions and flaky infrastructure or test behaviourAble to design secure release processes including access controls, secrets handling, and safe publishing workflowsExcited to work in an early-stage, high-ownership, fast-shipping environment