Sr. Site Reliability Engineer (SRE)
About The RoleAs a Senior Site Reliability Engineer, you will make an impact by designing and operating scalable, resilient, and highly available cloud‑native platforms that support critical business applications. You will be a valued member of a cross‑functional engineering team and work collaboratively with software engineers, architects, and product partners to embed reliability, automation, and performance best practices across the delivery lifecycle.In This Role, You WillDesign and implement scalable, fault‑tolerant architectures aligned with business, performance, and reliability objectivesEmbed Site Reliability Engineering (SRE) and DevOps principles into application development and operations workflowsBuild and maintain infrastructure and automation using tools such as Terraform, Docker, and Kubernetes to reduce manual effort and improve system resilienceImplement and manage CI/CD pipelines and source control practices using GitHub to streamline development and deploymentMonitor, measure, and optimize system health using observability platforms such as Datadog, Prometheus, Grafana, Splunk, and the ELK stackWork modelWe believe hybrid work is the way forward as we strive to provide flexibility wherever possible. Based on this role’s business requirements, this is a hybrid position requiring 2–3 days per week in a client or Cognizant office in Plano, TX. Regardless of your working arrangement, we are here to support a healthy work‑life balance through our various wellbeing programs.The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.What You Need To Have To Be ConsideredStrong experience in Site Reliability Engineering (SRE) and/or DevOps supporting large‑scale, cloud‑native systems10+ years of experienceHands‑on expertise with AWS and container orchestration technologies such as Docker and KubernetesProficiency with infrastructure‑as‑code and automation tools, including TerraformExperience implementing monitoring, alerting, and incident response using observability tools such as Datadog, Prometheus, Grafana, Splunk, and the ELK stackBachelor’s degree in Computer Science, Information Technology, or a related fieldThese will help you stand outAWS Certified Solutions Architect or equivalent cloud certificationExperience implementing reliability best practices such as SLIs, SLOs, error budgets, and proactive incident managementStrong analytical and troubleshooting skills with the ability to diagnose complex infrastructure and application issuesExperience mentoring engineers and contributing to a culture of continuous improvement and knowledge sharingSalary And Other CompensationThe annual salary for this position depends on experience and other qualifications of the successful candidate.This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.BenefitsCognizant offers the following benefits for this position, subject to applicable eligibility requirements:Medical/Dental/Vision/Life InsurancePaid holidays plus Paid Time Off401(k) plan and contributionsLong‑term and Short‑term DisabilityPaid Parental LeaveEmployee Stock Purchase PlanDisclaimerThe salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.