ITITC Infotech India Limited
Sr. Site Reliability Engineer
Navi Mumbai ₹3-6 LPA Posted 27 Feb 2025
FULL TIME
Linux
Aws
Job Description
Key Responsibilities:
- Configure, deploy, and operate public cloud services (Azure, AWS, GCP).
- Ensure high availability, security, performance, and disaster recovery best practices.
- Handle production incidents, plan escalations, conduct post-mortems, and perform impact analysis.
- Develop and manage CI/CD pipelines for automation and continuous deployment.
- Maintain a balance between Development & SRE mindset (Software & Infrastructure).
- Implement and maintain Application Performance Monitoring (APM) tools (Zabbix, Grafana, CloudWatch, etc.).
- Work with network and security components (BGP, TCP/IP, DNS, SMTP, HTTPS, Security Guardrails).
- Identify and resolve performance bottlenecks and anomalous system behavior.
- Use Infrastructure as Code (IaC) tools (Terraform, Ansible, Chef, Puppet, CloudFormation, ARM).
- Design and manage high availability infrastructure (regions, availability zones, replication).
- Automate and optimize Kubernetes cluster deployment and monitoring.
- Implement network architectures suitable for cloud topologies and cloud service expectations.
- Maintain firewalls and security solutions (Palo Alto, Fortinet, WAF, Cisco routers).
- Work with containerized environments (Docker, Kubernetes, EKS, GKE, Anthos, OpenShift).
- Implement DevOps best practices, rapid prototyping, and agile development methodologies.
- Troubleshoot and debug Kubernetes clusters and cloud infrastructure issues.
- Utilize SQL/NoSQL databases such as PostgreSQL for cloud storage management.
- Document troubleshooting processes, automation scripts, and procedural workflows.
- Provide client management support and collaborate with cross-functional teams.
- Stay updated on Kubernetes and cloud technology trends.
Required Skills & Qualifications:
- years of experience in Site Reliability Engineering, Cloud Engineering, or DevOps.
- Expertise in Azure, AWS, or GCP cloud platforms.
- Hands-on experience with Infrastructure as Code (IaC) and automation tools.
- Strong understanding of networking, security, and scalability in cloud environments.
- Experience in designing and deploying Kubernetes clusters for large-scale applications.
- Proficiency in Python, PowerShell, or Shell scripting for automation.
- Experience with firewalls, security policies, and monitoring tools.
- Strong knowledge of cloud security best practices.
- Ability to work independently and in a team, including 24x7 shifts when required.
Preferred Qualifications:
- Certifications in Azure, AWS, or GCP.
- Experience with Google Cloud Platform (GCP) services like Compute Engine, Cloud Storage, and Kubernetes Engine.
- Experience in a scaled agile environment with DevOps methodologies.
