JOBSEARCHER

Site Reliability Engineer (SRE)

Diverse LynxPlano, TXMay 21st, 2026
Job Title: Site Reliability Engineer (SRE)Location: Plano, TX 5 Days onsite roleLong Term ProjectCost: $55/hr on C2CJob Title: Site Reliability Engineer (SRE)Domain: Commercial & Investment BankingJob OverviewAs a Site Reliability Engineer (SRE), you will be responsible for solving complex business problems through scalable and efficient solutions. Leveraging code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their underlying systems. You will play a key role in ensuring reliability, availability, and performance while continuously improving operational processes.Key ResponsibilitiesDesign and contribute to scalable and reliable system architectures while collaborating with peers for consensusWork with engineering teams to implement CI/CD pipelines and automated deployment strategiesDesign, develop, test, and implement solutions focused on availability, scalability, and reliabilityBuild and manage infrastructure, configuration, and network as codeCollaborate with stakeholders and technical teams to troubleshoot and resolve complex issuesMonitor system performance and proactively address risks before impacting usersDrive adoption of SRE best practices, including reliability engineering and operational excellenceRequired Skills & QualificationsBachelor s degree in Computer Science, Engineering, or related field (or equivalent experience)3+ years of hands-on experience in software engineering or site reliability engineeringStrong understanding of reliability, scalability, performance, and security principlesProficiency in at least one programming language (Python, Java, or Spring Boot)Experience with observability tools and monitoring systems such as Grafana, Prometheus, Datadog, Splunk, or DynatraceHands-on experience with CI/CD tools (Jenkins, GitLab, Terraform)Experience with containerization and orchestration tools (Docker, Kubernetes, ECS)Knowledge of networking fundamentals and troubleshooting techniquesExperience implementing and maintaining SLO/SLA frameworks for critical systemsFamiliarity with chaos engineering tools (e.g., Gremlin, Chaos Monkey)Comfortable working with system metrics such as latency, throughput, and availabilityPreferred QualificationsExperience with cloud platforms and infrastructure (AWS, Azure, or GCP)Understanding of infrastructure components such as load balancers, routers, storage, and compute systemsHands-on experience with tools like Jira, Confluence, ServiceNow, and NetcoolStrong problem-solving and analytical skillsAbility to recommend and implement tools and processes to improve system reliability and efficiencyAdditional InformationStrong collaboration and communication skills are essentialAbility to work in a fast-paced, production-critical environmentDiverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.