GRGreyOrange
Senior Site Reliability Engineer-III
Gurgaon ₹4-6 LPA Posted 27 Oct 2025
FULL TIME
Distributed Systems
Monitoring
SRE
Job Description
Key Responsibilities:
- Define and enforce SLOs, SLIs, and error budgets across microservices.
- Architect an observability stack (metrics, logs, traces) and derive operational insights.
- Automate toil and manual operations through robust tooling and runbooks.
- Own the incident response lifecycle: detection, triage, RCA, and postmortems.
- Collaborate with product teams to build fault-tolerant and scalable systems.
- Champion performance tuning, capacity planning, and scalability testing.
- Optimize cloud costs while maintaining reliability of infrastructure.
- Participate in on-call rotations and manage large-scale production systems.
