Java-SRE engineer
System Reliability & PerformanceEnsure applications built in Java run reliably, with minimal downtime.Define and monitor SLIs (Service Level Indicators), SLOs (Service Level Objectives), and error budgets to measure reliability.Incident ManagementTroubleshoot production issues, perform root cause analysis, and implement permanent fixes.Automate repetitive operational tasks to reduce manual intervention.Monitoring & ObservabilityBuild dashboards using tools like Prometheus, Grafana, ELK stack.Implement logging and tracing for distributed Java applications.Automation & CI/CDDevelop scripts and pipelines for deployment, scaling, and rollback.Integrate with DevOps practices for continuous delivery.CollaborationWork closely with developers to design reliable systems.Partner with operations teams to ensure smooth deployments and upgrades.Required SkillsProgramming: Strong expertise in Java,Cloud Platforms: AWS, Azure, GCP; Kubernetes and Docker for container orchestration.DevOps Tools: Jenkins, GitHub Actions, Terraform, Ansible, Chef.Monitoring & Observability: Prometheus, Grafana, ELK, Jaeger.OS & Networking: Linux administration, DNS, load balancing, distributed systems.Soft Skills: Problem-solving, communication, collaboration across dev and ops teams.