JOBSEARCHER

Site Reliability Engineer (SRE)

Top Skills Required :1. SRE Monitoring2. Dynatrace3. Azure Kubernetes ServicesRole OverviewWe are seeking a highly skilled Site Reliability Engineer (SRE) to own the overall health, availability, performance, and resilience of our enterprise platform. The platform spans SQL Server, .NET, Java, React.js, Microservices, Kafka, and operates in a hybrid cloud environment on Azure and On Premises.The SRE will lead reliability engineering practices across the stack, manage infrastructure deployment pipelines using Terraform, drive application deployments through GitHub and Azure DevOps, ensure timely remediation of security vulnerabilities, and implement world class observability using Dynatrace and Splunk.________________________________________Key ResponsibilitiesPlatform Reliability & Operations" Own the end to end health, uptime, performance, and reliability of the platform across cloud (Azure) and on prem environments." Ensure resilience across application layers: .NET, Java, React.js, Microservices, and backend systems such as SQL Server and Kafka." Lead incident management, root cause analysis, and post incident reviews with a focus on continuous improvement.Infrastructure Engineering & Automation" Design, implement, and maintain cloud and on prem infrastructure using Terraform (IaC)." Own and optimize CI/CD pipelines for infrastructure and applications in:o GitHub Actionso Azure DevOps" Improve deployment automation, reliability, and release processes across all teams.Observability, Monitoring & Proactive Operations" Implement and enhance monitoring, alerting, dashboards, and analytics using:o Dynatrace (APM, RUM, synthetic monitoring, logs, metrics)o Splunk (log search, correlation, alerting)" Build proactive monitoring workflows to detect issues before they impact customers." Own SRE metrics such as SLOs, SLIs, Error Budgets, MTTR, MTBF, availability KPIs, and system productivity metrics." Performance tuning of the database / application services.Security & Compliance" Ensure all platform and application security vulnerabilities are identified and remediated on time." Partner with cybersecurity to ensure compliance with enterprise standards and policies." Automate security scans and integrate them into CI/CD pipelines.Performance & Scalability" Conduct performance analysis, load testing, and tuning across:o Microserviceso SQL Server databaseso Kafka clusterso Front end React.js applications" Partner with engineering teams to design scalable, reliable system architectures.Collaboration & Leadership" Collaborate with development, architecture, infrastructure, and security teams." Advocate for SRE and DevOps culture automation, reliability engineering, blameless postmortems." Mentor developers and engineers on reliability best practices and tools.________________________________________Required Qualifications" 5+ years of experience in SRE, DevOps, or Platform Engineering roles." Strong expertise in:o SQL Server administration and performance tuningo .NET, Java, Microservices architectureso React.js fundamentals" Hands on experience with:o Azure Cloud services (VMs, AKS, App Services, Networking)o On prem servers and hybrid integrationso Terraform (writing, testing, maintaining modules)o CI/CD with GitHub and Azure DevOps" Proficiency with observability tools:o Dynatrace (preferred)o Splunk" Experience with Kafka (producers, consumers, performance, tuning)." Strong understanding of SRE fundamentals:o SLO/SLI designo Error budgetso Distributed systems conceptso Incident response________________________________________Preferred Qualifications" Experience with containerization and Kubernetes (AKS or on prem K8s)." Experience with service mesh, API gateway technologies, or event driven architectures." Knowledge of secure coding practices and integrating security in CI/CD." Familiarity with enterprise networking, firewalls, and hybrid connectivity.________________________________________Soft Skills" Strong communication and collaboration abilities." Analytical mindset with strong problem solving skills." Ability to handle pressure in high severity incidents." Passion for automation, simplification, and continuous improvement.________________________________________Job ImpactIn this role, you will directly influence the reliability and stability of core enterprise services used by millions of customers and internal users. You will serve as a technical leader who bridges development, infrastructure, operations, and security to deliver a world class, resilient platform.Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.