{"schemaVersion":"jobsearcher.job.v1","id":"ac6a129ec6bbae1bc9217868","url":"https://jobsearcher.com/jobs/ac6a129ec6bbae1bc9217868","canonicalUrl":"https://jobsearcher.com/jobs/ac6a129ec6bbae1bc9217868","title":"Software Engineer, Infrastructure Platform","description":"At Docker, we make app development easier so developers can focus on what matters. Our remote-first team spans the globe, united by a passion for innovation and great developer experiences. With over 20 million monthly users and 20 billion image pulls, Docker is the #1 tool for building, sharing, and running apps—trusted by startups and Fortune100s alike. We're growing fast and just getting started. Come join us for a whale of a ride!\r\nOur Infrastructure Engineering team builds and operates the cloud-native platform that powers Docker's suite of products. We design resilient services, automate where it helps most, and measure what matters so hundreds of engineers can ship safely to millions of users every day.\r\nA core focus is self-service. We build paved-road platform capabilities that let internal teams provision, deploy, observe, and operate services with minimal friction and strong guardrails. We treat the platform as a product with clear contracts, well-defined defaults, and great documentation. Success is measured by adoption and fewer support requests.\r\nHow We Work\r\nWrite it down, ship it, iterate: RFCs and design docs, code review, and small safe releases.\r\nSustainable reliability: we prioritize root-cause fixes, good alerts, and automation over heroics.\r\nCross-functional by default: we partner closely with product and security teams.\r\nAI-accelerated execution: we build agentic workflows to reduce toil and improve incident response, with guardrails, auditability, and human review.\r\nWhat You'll Work On\r\nReducing toil through automation, including AI-assisted and agentic operational workflows.\r\nBuilding self-service onboarding and deployment workflows that reduce tickets and speed delivery.\r\nScaling Kubernetes foundations and evolving our traffic and ingress stack.\r\nResponsibilities\r\nSelf-Service Platform Services\r\nBuild and operate internal platform services and APIs in Go, including provisioning, quotas and policies, cost insights, and platform workflows.\r\nDeliver golden paths for self-serve onboarding and day-2 operations, including access, deployment setup, observability defaults, and governance guardrails.\r\nPartner with teams to drive adoption through clear docs, examples, and measurable outcomes.\r\nInfrastructure as Code and Reliability\r\nCodify infrastructure with Terraform and GitOps practices, and contribute to platform tooling in Go.\r\nDefine and improve SLOs, alerting, and operational readiness. Participate in incident response and preventive follow-ups.\r\nHelp standardize safe delivery patterns, including testing gates, canaries, and rollback triggers, so deployments are routine and low-risk.\r\nKubernetes and Networking Foundations\r\nOperate and scale multi-tenant EKS clusters and traffic and ingress systems to deliver secure, reliable routing.\r\nEvaluate and adopt improvements with a bias toward incremental rollout and measurable impact.\r\nAI and Agentic Workflows for Reliability\r\nBuild and iterate on agentic workflows that reduce operational toil, including triage support, context gathering, safe runbook execution, and remediation suggestions.\r\nIntegrate automation into delivery and operations in a way that is safe, observable, and auditable.\r\nOn-Call and Incident Response\r\nYou'll join an on-call rotation after onboarding and shadowing, and participate in incident response during your shifts.\r\nWe aim for sustainable on-call through good alerting, automation, and blameless postmortems focused on prevention.\r\nQualifications\r\n4+ years of backend software engineering experience building large-scale cloud or distributed systems.\r\nStrong software development skills in Go or a similar language, including design, testing, debugging, and code review.\r\nExperience shipping and operating cloud services in production, often 3+ years. We hire for skill and impact, not years alone.\r\nSolid foundation in Linux, networking fundamentals, and cloud security.\r\nExperience building operational automation, including AI-assisted or agentic workflows, with an emphasis on safety, guardrails, and auditability.\r\nClear written and verbal communication in a remote environment, including RFCs, incident writeups, and async collaboration.\r\nNice-to-have\r\nKubernetes and EKS experience, plus ingress, CNI, service mesh, and familiarity with L4 and L7 load balancing.\r\nObservability tooling such as OpenTelemetry, Prometheus, and Grafana, plus alerting and SLO practice.\r\nCI/CD and progressive delivery, including GitHub Actions or Argo CD, canaries, and automated rollback.\r\nCost optimization at scale, including FinOps and capacity modeling.\r\nDistributed systems, containers, and Go-based platform tooling.\r\nWhat To Expect\r\nFirst 30 Days\r\nShip your first change to a Terraform module or internal service and learn how we operate.\r\nShadow on-call and build context on our platform and reliability priorities.\r\nFirst 90 Days\r\nOwn a component and deliver an improvement from design to production with measurable impact.\r\nJoin the on-call rotation and contribute effectively during your shifts.\r\nFirst Year\r\nLead or co-lead a meaningful platform initiative, with scope that scales by level, and help reduce toil through automation.\r\nBecome a trusted contributor in one or more areas such as platform services, Kubernetes and networking foundations, or reliability automation.\r\nDocker considers sponsorship on a case-by-case basis based on business needs.\r\nWe use Covey as part of our hiring and / or promotional process for jobs in NYC and certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April13, 2024. Please see the independent bias audit report covering our use of Covey here.\r\nPerks\r\nFreedom & flexibility; fit your work around your life\r\nDesignated quarterly Whaleness Days plus end of year Whaleness break\r\nHome office setup; we want you comfortable while you work\r\n16 weeks of paid Parental leave\r\nTechnology stipend equivalent to $100 net/month\r\nPTO plan that encourages you to take time to do the things you enjoy\r\nTraining stipend for conferences, courses and classes\r\nEquity; we are a growing start-up and want all employees to have a share in the success of the company\r\nDocker Swag\r\nMedical benefits, retirement and holidays vary by country\r\nRemote-first culture, with offices in Seattle and Paris\r\nDocker embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. The more inclusive we are, the better our company will be.\r\nCompensation Range: €72,064 - €123,750\r\nJ-18808-Ljbffr","company":"Docker","rawCompany":"docker","city":"New York","state":"NY","isRemote":false,"isActive":true,"createdAt":"2026-05-27T00:33:28.755Z","occupations":[{"code":"15-1299.08","title":"Computer Systems Engineers/Architects","slug":"computer-systems-engineers-architects"},{"code":"15-1252.00","title":"Software Developers","slug":"software-developers"},{"code":"15-1244.00","title":"Network and Computer Systems Administrators","slug":"network-and-computer-systems-administrators"}],"industries":[{"code":"513210","title":"Software Publishers","slug":"software-publishers"},{"code":"541512","title":"Computer Systems Design Services","slug":"computer-systems-design-services"},{"code":"541511","title":"Custom Computer Programming Services","slug":"custom-computer-programming-services"}],"jobPosting":{"@context":"https://schema.org","@type":"JobPosting","title":"Software Engineer, Infrastructure Platform","description":"At Docker, we make app development easier so developers can focus on what matters. Our remote-first team spans the globe, united by a passion for innovation and great developer experiences. With over 20 million monthly users and 20 billion image pulls, Docker is the #1 tool for building, sharing, and running apps—trusted by startups and Fortune100s alike. We're growing fast and just getting started. Come join us for a whale of a ride!\r\nOur Infrastructure Engineering team builds and operates the cloud-native platform that powers Docker's suite of products. We design resilient services, automate where it helps most, and measure what matters so hundreds of engineers can ship safely to millions of users every day.\r\nA core focus is self-service. We build paved-road platform capabilities that let internal teams provision, deploy, observe, and operate services with minimal friction and strong guardrails. We treat the platform as a product with clear contracts, well-defined defaults, and great documentation. Success is measured by adoption and fewer support requests.\r\nHow We Work\r\nWrite it down, ship it, iterate: RFCs and design docs, code review, and small safe releases.\r\nSustainable reliability: we prioritize root-cause fixes, good alerts, and automation over heroics.\r\nCross-functional by default: we partner closely with product and security teams.\r\nAI-accelerated execution: we build agentic workflows to reduce toil and improve incident response, with guardrails, auditability, and human review.\r\nWhat You'll Work On\r\nReducing toil through automation, including AI-assisted and agentic operational workflows.\r\nBuilding self-service onboarding and deployment workflows that reduce tickets and speed delivery.\r\nScaling Kubernetes foundations and evolving our traffic and ingress stack.\r\nResponsibilities\r\nSelf-Service Platform Services\r\nBuild and operate internal platform services and APIs in Go, including provisioning, quotas and policies, cost insights, and platform workflows.\r\nDeliver golden paths for self-serve onboarding and day-2 operations, including access, deployment setup, observability defaults, and governance guardrails.\r\nPartner with teams to drive adoption through clear docs, examples, and measurable outcomes.\r\nInfrastructure as Code and Reliability\r\nCodify infrastructure with Terraform and GitOps practices, and contribute to platform tooling in Go.\r\nDefine and improve SLOs, alerting, and operational readiness. Participate in incident response and preventive follow-ups.\r\nHelp standardize safe delivery patterns, including testing gates, canaries, and rollback triggers, so deployments are routine and low-risk.\r\nKubernetes and Networking Foundations\r\nOperate and scale multi-tenant EKS clusters and traffic and ingress systems to deliver secure, reliable routing.\r\nEvaluate and adopt improvements with a bias toward incremental rollout and measurable impact.\r\nAI and Agentic Workflows for Reliability\r\nBuild and iterate on agentic workflows that reduce operational toil, including triage support, context gathering, safe runbook execution, and remediation suggestions.\r\nIntegrate automation into delivery and operations in a way that is safe, observable, and auditable.\r\nOn-Call and Incident Response\r\nYou'll join an on-call rotation after onboarding and shadowing, and participate in incident response during your shifts.\r\nWe aim for sustainable on-call through good alerting, automation, and blameless postmortems focused on prevention.\r\nQualifications\r\n4+ years of backend software engineering experience building large-scale cloud or distributed systems.\r\nStrong software development skills in Go or a similar language, including design, testing, debugging, and code review.\r\nExperience shipping and operating cloud services in production, often 3+ years. We hire for skill and impact, not years alone.\r\nSolid foundation in Linux, networking fundamentals, and cloud security.\r\nExperience building operational automation, including AI-assisted or agentic workflows, with an emphasis on safety, guardrails, and auditability.\r\nClear written and verbal communication in a remote environment, including RFCs, incident writeups, and async collaboration.\r\nNice-to-have\r\nKubernetes and EKS experience, plus ingress, CNI, service mesh, and familiarity with L4 and L7 load balancing.\r\nObservability tooling such as OpenTelemetry, Prometheus, and Grafana, plus alerting and SLO practice.\r\nCI/CD and progressive delivery, including GitHub Actions or Argo CD, canaries, and automated rollback.\r\nCost optimization at scale, including FinOps and capacity modeling.\r\nDistributed systems, containers, and Go-based platform tooling.\r\nWhat To Expect\r\nFirst 30 Days\r\nShip your first change to a Terraform module or internal service and learn how we operate.\r\nShadow on-call and build context on our platform and reliability priorities.\r\nFirst 90 Days\r\nOwn a component and deliver an improvement from design to production with measurable impact.\r\nJoin the on-call rotation and contribute effectively during your shifts.\r\nFirst Year\r\nLead or co-lead a meaningful platform initiative, with scope that scales by level, and help reduce toil through automation.\r\nBecome a trusted contributor in one or more areas such as platform services, Kubernetes and networking foundations, or reliability automation.\r\nDocker considers sponsorship on a case-by-case basis based on business needs.\r\nWe use Covey as part of our hiring and / or promotional process for jobs in NYC and certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April13, 2024. Please see the independent bias audit report covering our use of Covey here.\r\nPerks\r\nFreedom & flexibility; fit your work around your life\r\nDesignated quarterly Whaleness Days plus end of year Whaleness break\r\nHome office setup; we want you comfortable while you work\r\n16 weeks of paid Parental leave\r\nTechnology stipend equivalent to $100 net/month\r\nPTO plan that encourages you to take time to do the things you enjoy\r\nTraining stipend for conferences, courses and classes\r\nEquity; we are a growing start-up and want all employees to have a share in the success of the company\r\nDocker Swag\r\nMedical benefits, retirement and holidays vary by country\r\nRemote-first culture, with offices in Seattle and Paris\r\nDocker embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. The more inclusive we are, the better our company will be.\r\nCompensation Range: €72,064 - €123,750\r\nJ-18808-Ljbffr","datePosted":"2026-05-27T00:33:28.755Z","dateModified":"2026-05-27T00:33:28.755Z","hiringOrganization":{"@type":"Organization","name":"Docker","sameAs":"https://jobsearcher.com"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York","addressRegion":"NY","addressCountry":"US"}},"identifier":{"@type":"PropertyValue","name":"JobSearcher","value":"ac6a129ec6bbae1bc9217868"},"url":"https://jobsearcher.com/jobs/ac6a129ec6bbae1bc9217868"}}