GRGreyOrange
Senior Member Technical Staff - ITOps
Gurgaon ₹5-7 LPA Posted 27 Oct 2025
FULL TIME
Docker
Kubernetes
Gcp
Cloud Infrastructure
Site Reliability Engineering
+1 more
Job Description
Key Responsibilities:
- Lead reliability engineering projects and drive them to successful completion.
- Ensure system stability, high availability, and optimal performance through proactive monitoring and troubleshooting.
- Design, build, and maintain reliable and scalable cloud-based infrastructure and services.
- Implement and manage observability tools (Grafana, Splunk, Dynatrace) for real-time monitoring, alerting, and logging.
- Automate manual and repetitive processes using Python, Bash, or PowerShell to enhance operational efficiency.
- Manage and optimize CI/CD pipelines and automation frameworks (Jenkins, GitLab CI, Ansible, Chef).
- Drive adoption of SRE principles — including SLIs, SLOs, SLAs, and Error Budgets — across teams.
- Provide on-call support and lead incident management, ensuring effective root cause analysis and postmortems.
- Collaborate with development and infrastructure teams to enhance platform observability and reduce operational toil.
- Engage in capacity planning, cost optimization, and scalability strategy discussions.
