AIAirtel
Site Reliability Engineer
Gurgaon ₹5-10 LPA Posted 11 Apr 2025
FULL TIME
Jenkins
Docker
Ansible
Kubernetes
Linux
Job Description
Key Deliverables:
- Ensure 100% uptime by designing, deploying, and maintaining highly available, scalable global systems.
- Lead incident response, perform root cause analysis, and conduct blameless post-mortems to improve system resilience.
- Automate deployment, monitoring, and maintenance processes to enhance platform stability and reliability.
- Oversee production environments, including applications, middleware, infrastructure, and databases like Postgres, MongoDB, and MySQL.
- Develop and implement CI/CD pipelines and configuration management using Jenkins, Ansible, and Shell scripting.
Role Responsibilities:
- Drive architecture design and implementation for containerized, cloud-native applications using Docker and Kubernetes.
- Collaborate with Agile teams to define technical requirements and best practices for reliability engineering.
- Monitor system health using tools like Prometheus, ELK, AppDynamics, and Nagios; ensure proactive scaling and performance tuning.
- Manage middleware environments (Weblogic, Tomcat, JBoss) and distributed systems (RabbitMQ, Kafka, Redis).
- Participate in planning sessions, architecture/code reviews, and mentor team members on SRE practices and tooling.
