JOBSEARCHER

Senior Dev Operations Engineer

SoftsolPleasanton, CAApril 12th, 2026
Job Title: Senior Dev Operations Engineer SRE (CR260)Location: RemoteDuration: Long TermMUST HAVESExperience setting up alerts / alarms / notifications in AWS cloud. CloudWatch / DynatraceExperience with AWS solutions using AWS services including Kafka, ECS, EKS.Experience with IaC (Infrastructure as code) CDK or Terraform.ObjectiveThe Site Reliability Engineer (SRE) will be a lead on the DevOps team and is responsible for system administration areas including monitoring, installation, configuration, maintenance, operations, and architecture of AWS cloud environments and on premise environments. The candidate will work within a team in implementing and maintaining all production and pre-production environments by implementing tools and automation. Looking for a candidate with exceptional Site Reliability and DevOps skills and should have extensive knowledge and experience in implementing solutions and tools to maintain and grow all application environments. Most importantly, the right individual will possess a positive, "can-do" attitude and a passion for delivering technical solutions in a fast-paced environment. In addition, the individual will be dedicated, independent, and collaborate at a high level in ensure the stability and reliability of infrastructure and applications running in the AWS Cloud and on premise environments. Advanced experience working in AWS environments will be expected while leading the implementing of improvements and advancements.DeliverablesMonitoring sites, environments, and software by implementing tools and automation to achieve 99.9% uptime.Measurement, optimization, and tuning of system performance and ensuring that systems will run reliably and are highly available in a 24/7 production environment.Automate system and application monitoring using monitoring and automation toolsAnticipating potential problems before they occur and coming up with solutions.Conducting post-incident reviews and Root Cause Analysis.Documenting your work to turn findings into repeatable actions.Coding automation within a site infrastructure.Implement production monitoring systems.Utilize strong analytical and problem-solving skills.Security assessments and addressing vulnerabilities.Design and deploy AWS solutions using AWS services (i.e. EC2, S3, Glacier, ELB, RDS, IAM, Route 53, VPC, Auto Scaling, Cloud Watch, Cloud Trail, Cloud Formation, Security Groups, API Gateway, SSM, Route table, Endpoint service, etc.)Provision, management, and day-to-day operations of AWS environmentsImplement alarms / alerts / notifications using AWS services (i.e. Cloud Watch)Implement AWS Multi AZ accounts for HA and DRDesign AWS infrastructure that minimize operational costs through push-button deployment at scale with near-zero downtime.Develop and maintain configuration management solutions.Provide technical guidance, knowledge transfers and mentorship to State Fund internal engineeringPeers As Required And Lead Technical Staff Responsibilities.Server Maintenance based on updates, system requirements, data usage, and antivirus requirements.Responsible for the design, implementation, and support of large scale web farm infrastructure across multiple data centers supporting the Infrastructure as a Service (IaaS) offering.Help engineering implement new technologies in development for future production deployment.Working with team to analyze and design infrastructure witch includes virtualization, clustering, database, disaster recovery, and geographic redundancy.Triage and provide technical solutions to environment related issues encountered by new and existing applicationsSupport developers with change requests, uptime, and performance related issues.Documentation of work in regards to bug reports, systems analysis, application monitoring, and common task reportingAuthor internal documentation, such as environment diagrams, installation/configuration documents and release notes.Assist in establishing and implementing configuration management program and policies.Troubleshoot and debug environment and infrastructure problems found in the production and non-production environments.Collaborating with software developers, engineers, and operations teams.Provide 24 by 7 production supportTechnical Knowledge And Skills6+ years of overall IT experience4+ years of AWS Cloud management experience with below skill setAWS Certified DevOps and / or Solution Architect certificationExperience in AWS provisioning, operations, and management of AWS environments.Experience setting up alerts / alarms / notifications in AWS cloud. CloudWatch / DynatraceExperience with AWS solutions using AWS services including Kafka, ECS, EKS.Experience with IaC (Infrastructure as code) CDK or Terraform.Experience setting up / maintaining multi AZ infrastructure including HA and DR in AWS.Experience with code repositories Azure DevOps Server, GIT, GITLab, SVNExperience with continuous integration tools Jenkins, Azure PipelinesExcellent knowledge of Linux systemsExperience with system automation and configuration management tools including AnsibleExperience with Python scriptingStrong background in networking, load balancing, and firewallsHigh-level understanding of networking standard protocols and components such as: HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load BalancingThorough understanding of and experience with managing web applications in a highly available environmentExperience in Software development is a plusFamiliarity with deploying and configuring Java and .Net applications.Experience with Application Security Testing tools a plus (Coverity, Tenable, BlackDuck, etc)Understanding of SQL, PL/SQL, and T-SQL command