AI

Site Reliability Engineer

Airtel
Gurgaon5-10 LPA Posted 11 Apr 2025
FULL TIME
Jenkins
Docker
Ansible
Kubernetes
Linux

Job Description

Key Deliverables:

  • Ensure 100% uptime by designing, deploying, and maintaining highly available, scalable global systems.
  • Lead incident response, perform root cause analysis, and conduct blameless post-mortems to improve system resilience.
  • Automate deployment, monitoring, and maintenance processes to enhance platform stability and reliability.
  • Oversee production environments, including applications, middleware, infrastructure, and databases like Postgres, MongoDB, and MySQL.
  • Develop and implement CI/CD pipelines and configuration management using Jenkins, Ansible, and Shell scripting.

Role Responsibilities:

  • Drive architecture design and implementation for containerized, cloud-native applications using Docker and Kubernetes.
  • Collaborate with Agile teams to define technical requirements and best practices for reliability engineering.
  • Monitor system health using tools like Prometheus, ELK, AppDynamics, and Nagios; ensure proactive scaling and performance tuning.
  • Manage middleware environments (Weblogic, Tomcat, JBoss) and distributed systems (RabbitMQ, Kafka, Redis).
  • Participate in planning sessions, architecture/code reviews, and mentor team members on SRE practices and tooling.

Join WhatsApp Channel