Senior Platform/MLOps Engineer
Job Description: Design, implement, and maintain reliable, scalable, and secure infrastructure, applications, and tooling, with a focus on our ML/AI pipelines and workloadsWrite clean, maintainable code, and perform peer code-reviewsWrite clear and concise documentation and engage in cross-team communication and knowledge sharingWork with other team members to investigate design approaches, prototype new technology and evaluate technical feasibilityPair with adjacent teams to understand how your frameworks and infrastructure are actually used in the field, continuously improving them and leveraging recent advances to improve developer velocityRequirements: At least 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).B.S. or M.S. degree (or equivalent) in Computer Science, Engineering, or a related fieldProficiency in at least one modern programming languages (Python, Javascript, C#, Go, etc)Demonstrated industry best-practices in MLOpsProficiency with CI/CD tools and GitOps workflowsFamiliarity with running GPU workloads in kubernetesStrong knowledge of Kubernetes (self-hosted and managed) and modern k8s paradigms (e.g. CNCF)Proficiency with Infrastructure as Code tools (Terraform, etc) and configuration management tools (Ansible, etc)Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry)