GR

Senior Member Technical Staff - ITOps

GreyOrange
Gurgaon5-7 LPA Posted 27 Oct 2025
FULL TIME
Docker
Kubernetes
Gcp
Cloud Infrastructure
Site Reliability Engineering
+1 more

Job Description

Key Responsibilities:

  • Lead reliability engineering projects and drive them to successful completion.
  • Ensure system stability, high availability, and optimal performance through proactive monitoring and troubleshooting.
  • Design, build, and maintain reliable and scalable cloud-based infrastructure and services.
  • Implement and manage observability tools (Grafana, Splunk, Dynatrace) for real-time monitoring, alerting, and logging.
  • Automate manual and repetitive processes using Python, Bash, or PowerShell to enhance operational efficiency.
  • Manage and optimize CI/CD pipelines and automation frameworks (Jenkins, GitLab CI, Ansible, Chef).
  • Drive adoption of SRE principles — including SLIs, SLOs, SLAs, and Error Budgets — across teams.
  • Provide on-call support and lead incident management, ensuring effective root cause analysis and postmortems.
  • Collaborate with development and infrastructure teams to enhance platform observability and reduce operational toil.
  • Engage in capacity planning, cost optimization, and scalability strategy discussions.

Join WhatsApp Channel