GRGreyOrange
Senior Site Reliability Engineer
Gurgaon ₹4-7 LPA Posted 27 Oct 2025
FULL TIME
Jenkins
Bash
Site Reliability Engineering
Automation
Python
+1 more
Job Description
Key Responsibilities:
- Lead reliability engineering projects from design to execution, ensuring alignment with business objectives.
- Ensure system stability, performance, and high availability by proactively monitoring and troubleshooting production issues.
- Design, build, and maintain scalable, efficient, and reliable cloud-based infrastructure and services.
- Automate manual processes to improve platform observability, reduce operational toil, and enhance reliability.
- Implement and manage observability solutions using tools such as Grafana, Splunk, and Dynatrace for comprehensive monitoring, alerting, and logging.
- Own end-to-end availability, performance, and scalability of critical services and internal tools.
- Apply and manage SLI, SLO, SLA, and Error Budget frameworks to maintain service reliability.
- Provide on-call support and lead incident management and response activities.
- Conduct blameless postmortems to identify root causes and ensure preventive measures.
- Collaborate with development and infrastructure teams to integrate reliability best practices into design and deployment processes.
- Manage and maintain internal tools and infrastructure used by other development teams.
