Application Production Support - Site Reliability Engineer and DevOps
Job Title: Application Production Support - Site Reliability Engineer and DevOpsLocation: Berkeley Heights, NJ (5 Days onsite) - Locals PreferEmployment Type: Fulltime About Smart IT Frame:At Smart IT Frame, we connect top talent with leading organizations across the USA. With over a decade of staffing excellence, we specialize in IT, healthcare, and professional roles, empowering both clients and candidates to grow together.Role Overview:We are looking for an AppOps Engineer with strong experience in release management, deployment, and production support in a cloud-native environment.Key Responsibilities:1). Manage release and deployment activities across environments (especially PPD and Production)2). Create, review, and manage Merge Requests (MRs) in GitLab as part of deployment workflows3). Follow Git branching strategies and ensure proper version control practices4). Monitor deployments using Argo CD (GitOps) and ensure successful rollout of applications5). Handle production readiness, release coordination, and deployment windows6). Work closely with Dev teams, QA, and stakeholders during deployments and testing cycles.Technical Skills Required:1). Cloud & Kubernetes: Strong hands-on experience with AWS (EKS), CloudWatch and other services.2). Solid understanding of: Kubernetes architecture3). Pod lifecycle: Deployments, Services, ConfigMaps4). Experience using kubectl commands for troubleshooting5). Strong experience with: GitLab and Git branching strategies.6). Experience with Argo CD for application deployment and monitoring, Understanding of GitOps-based deployment model.7). Hands-on experience with Helm charts, Helm lint, Managing application configurations using Helm.8). Monitoring & Observability: Dynatrace (APM monitoring), Splunk (logs & debugging) and Moogsoft (alerts & incidents)9). Scripting & OS: Linux and Shell / Bash / PowerShell10). Ability to troubleshoot at system and application level, Messaging & Integration11). Basic knowledge of Kafka (producers, brokers, connectivity issues)12). Scheduling / Batch: Experience with Control-M for job scheduling and monitoring13). Experience working in Agile environment (Scrum / SAFe)14). Participation in: Sprint planning and PI planning15). Use of Jira (task tracking) and Confluence (documentation)16). Continuous Integration and Continuous Integration17). DAST Scanning mechanism.Good to Have:1). Understanding of banking/payment systems and environments2). Experience working with multi-environment setups (QA, PPD, PROD)3). Exposure to Chaos Testing / resilience validation