JOBSEARCHER
<Back to Search

Senior Platform Engineer

About us: Axiomatic AI is building a new class of AI systems designed to reason with the rigor of the scientific method. By combining deep learning with formal logic and physics-based modeling, we create verifiable, interpretable AI systems that collaborate with and support human researchers in high-stakes scientific and engineering workflows.Our mission, 30×30, is to deliver a 30× improvement in the speed, accessibility, and cost of semiconductor and photonic hardware development by 2030.We aim to revolutionize hardware design and simulation in these industries and are building a team of highly motivated professionals to bring these innovations from research into commercial products.Position OverviewAs a Senior Platform Engineer at Axiomatic, you will own the reliability, deployment, and operational excellence of our AI platform. This role focuses primarily on infrastructure, CI/CD, and operations, with additional responsibilities for automation and tooling development.You will:Lead deployment strategies and CI/CD pipelines across multiple environmentsArchitect and maintain multi-cloud infrastructure (Azure, AWS, GCP) and on-premise deploymentsOwn infrastructure as code using Terraform to automate provisioning and configurationBuild comprehensive observability systems: monitoring, metrics, logging, and alertingImplement security controls, compliance frameworks, and data governance policiesDevelop automation tools, APIs, and scripts (Python) to improve operational efficiencyEnsure system reliability, performance, and scalabilityDrive incident response, postmortems, and continuous improvementTroubleshoot infrastructure and application issues across multiple environments.Your missionDeployment & CI/CDDesign and implement deployment pipelines for multi-environment releases (dev, staging, production)Own the full deployment lifecycle: build, test, release, and rollback strategiesImplement blue-green deployments, canary releases, and progressive rolloutsBuild automated deployment tooling and workflowsEnsure zero-downtime deployments and rollback capabilitiesOptimize build and deployment performanceManage artifact repositories and container registriesInfrastructure & Cloud OperationsDesign and operate multi-cloud infrastructure across Azure, AWS, and GCPArchitect and deploy on-premise solutions for enterprise customers (Linux-based)Manage Kubernetes clusters, container orchestration, and networkingImplement disaster recovery, backup strategies, and business continuityOptimize cloud costs and resource utilizationDefine and track SLIs, SLOs, and error budgets for critical servicesInfrastructure as CodeWrite and maintain Terraform modules for infrastructure provisioningImplement GitOps workflows for infrastructure changesAutomate infrastructure scaling, updates, and operationsEnsure reproducible and version-controlled infrastructureObservability & MonitoringDesign comprehensive monitoring, logging, and alerting (Prometheus, Grafana, Datadog, or similar)Build dashboards for system health, performance, and business metricsImplement distributed tracing for microservicesConduct capacity planning and performance analysisDrive reliability improvements through data-driven insightsSecurity & ComplianceImplement security best practices: identity management, secrets management, network policiesWork towards or maintain security certifications (SOC 2, ISO 27001, or similar)Conduct security audits and vulnerability remediationImplement data governance policies for AI pipelines and user dataEnsure compliance with data privacy regulations (GDPR, CCPA)Automation & Tooling DevelopmentWrite automation scripts and tools in Python for operational tasksBuild internal tooling for deployments, monitoring, and incident responseDevelop runbooks, automation, and self-healing systemsCreate APIs for infrastructure operations when neededMaintain high code quality and testing standards for toolingReliability & Incident ManagementParticipate in on-call rotation and lead incident responseConduct blameless postmortems and drive action itemsBuild and maintain incident response playbooksImprove system resilience and failure modesCollaborationPartner with engineering teams on deployment strategies and architectureWork with security team on compliance and governanceMentor engineers on operational best practicesDocument systems, procedures, and runbooksKey requirements7+ years of experience in Platform Engineering, Site Reliability Engineering, DevOps, or Infrastructure Engineering rolesDeployment expert: Deep experience with CI/CD pipelines, release strategies, and production deployments at scaleMulti-cloud expertise: Hands-on experience with Azure and AWS required (GCP is a plus)On-premise deployment experience: Linux system administration, bare-metal provisioning, networkingTerraform expert: Deep experience writing and maintaining infrastructure as codeObservability systems: Proven track record building monitoring, alerting, and metrics platformsSecurity mindset: Experience implementing security controls and best practices. Security certification preferred (CISSP, CEH, AWS/Azure Security Specialty, or similar)Data governance: Understanding of data privacy, residency requirements, and governance frameworksBackend/scripting skills: Python (preferred) or Go for automation, tooling, and operational scriptsKubernetes and container orchestration in productionStrong Linux/Unix administration and scripting (Bash, Python)CI/CD platforms: GitHub Actions, GitLab CI, Jenkins, or similarVersion control and GitOps practicesStrong problem-solving and debugging skillsFluent in English (Spanish is a plus)Nice-to-HavePython proficiency for automation and internal toolingExperience with cloud AI platforms (Vertex AI, Azure ML, AWS SageMaker)Service mesh experience (Istio, Linkerd) or API gatewaysExperience with GPU workloads and ML infrastructureFinOps and cloud cost optimizationCompliance frameworks experience (SOC 2, ISO 27001, HIPAA, FedRAMP)Database operations: PostgreSQL, Redis administrationExperience with FastAPI or similar frameworks for internal toolsContributions to open-source infrastructure projectsBackground in hardware or semiconductor industriesWork model & location expectations:Team work model:Hybrid. Open to RemotePrimary location: Barcelona / BostonOn-site expectations:This role can be hybrid or remote. For hybrid arrangements, we would expect ~2 days per week in the office (with flexibility). Occasional travel to our Barcelona or Boston office may be required if remote.Hiring Manager note:Given that a significant portion of the team works remotely, we are open to flexible working arrangements, including hybrid or fully remote options depending on location and team needs.Why join us?At Axiomatic_AI, you will be working on technology that drives innovation in AI for scientific and engineering applications in line with our 30X30 mission.This is your opportunity to contribute to the development of new AI architectures that can reason coherently and produce interpretable and verifiable solutions. Consequently, see those ideas commercialized into products that will shape the future of hardware and computing, while collaborating with a global team of engineers and AI specialists.We believe in pushing the boundaries of what is possible and continuously seek to redefine the intersection of AI, with focus on formal consistency. If you're ready to take your expertise in artificial intelligence and physics to the next level, we want to hear from you!

Showing 50 of 40,469 matching similar jobs