Upvote
Downvote
Senior Principal Engineer Site Reliability
Share Job
- Suggest Revision
Full-time
- Join us to do the best work of your career and make a profound social impact as a Senior Principal Engineer - Site Reliability Engineering on our Service Delivery Team in Austin, Texas.
- The Senior Principal Engineer- Site Reliability Engineering supporting Artificial Intelligence/Machine Learning/High Performance Compute Solutions, Service Delivery will be responsible for providing the primary management, administration, support, and ongoing maintenance of customer Platforms within a 24x7x365 datacenter environment.
- Manage and maintain container platform (Kubernetes, OpenShift) infrastructure, including installation, configuration, and upgrades and optimize system performance, capacity, and availability of the environment
- Hands on experience working in an infrastructure managed services environment, supporting complex engineered solution in production with Artificial Intelligence/Machine Learning/High Performance Compute Systems and Platforms, Converged/ Hyper-Converged infrastructure along with fluency in AI/ML pipelines, Nvidia GPU optimization, InfiniBand networking, Machine Learning operating systems such as cnvrg.io, Compute Orchestration Platform such as runai etc
- Programming experience with Python, Go, Ruby, Shell Scripts, PowerShell along with hands on experience with ELK, Prometheus, Grafana, Ansible, Git, or similar technologies
Active Job
Updated 5 days agoSimilar Job
Relevance
Active