GR

Senior Site Reliability Engineer

GreyOrange
Gurgaon4-7 LPA Posted 27 Oct 2025
FULL TIME
Jenkins
Bash
Site Reliability Engineering
Automation
Python
+1 more

Job Description

Key Responsibilities:

  • Lead reliability engineering projects from design to execution, ensuring alignment with business objectives.
  • Ensure system stability, performance, and high availability by proactively monitoring and troubleshooting production issues.
  • Design, build, and maintain scalable, efficient, and reliable cloud-based infrastructure and services.
  • Automate manual processes to improve platform observability, reduce operational toil, and enhance reliability.
  • Implement and manage observability solutions using tools such as Grafana, Splunk, and Dynatrace for comprehensive monitoring, alerting, and logging.
  • Own end-to-end availability, performance, and scalability of critical services and internal tools.
  • Apply and manage SLI, SLO, SLA, and Error Budget frameworks to maintain service reliability.
  • Provide on-call support and lead incident management and response activities.
  • Conduct blameless postmortems to identify root causes and ensure preventive measures.
  • Collaborate with development and infrastructure teams to integrate reliability best practices into design and deployment processes.
  • Manage and maintain internal tools and infrastructure used by other development teams.

Join WhatsApp Channel