Site Reliability Engineer
ARCHIVED
We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.
SRE Engineer-AI/MLSunnyvale, CA or Austin, TX-3 days hybrid Required Skills & QualificationsCore Technical SkillsStrong experience in DevOps / SRE rolesStrong Python programming experience(FastAPI/Flask etc.) Hands-on expertise with Linux, shell scripting, and system internalsStrong experience with Kubernetes & Docker in productionCloud expertise in AWS (EKS, EC2, IAM, S3, Lambda, networking); GCP/Azure is a plusCI/CD tooling: GitHub Actions, Jenkins, GitLab CI, ArgoCD, etc.AI/ML & Data Platform ExposurePractical experience supporting AI/ML or data engineering platforms.Experience with LLMs (OpenAI, Anthropic, Azure OpenAI, or open-source models).Strong understanding of RAG patterns including vector search, embeddings, and document pipelines.Familiarity with MLOps concepts and tools (MLflow, Kubeflow, SageMaker, Airflow, Ray, Spark – any).Understanding of model lifecycle management, deployment strategies, and monitoring.