GR

Senior Site Reliability Engineer-III

GreyOrange
Gurgaon4-6 LPA Posted 27 Oct 2025
FULL TIME
Distributed Systems
Monitoring
SRE

Job Description

Key Responsibilities:

  • Define and enforce SLOs, SLIs, and error budgets across microservices.
  • Architect an observability stack (metrics, logs, traces) and derive operational insights.
  • Automate toil and manual operations through robust tooling and runbooks.
  • Own the incident response lifecycle: detection, triage, RCA, and postmortems.
  • Collaborate with product teams to build fault-tolerant and scalable systems.
  • Champion performance tuning, capacity planning, and scalability testing.
  • Optimize cloud costs while maintaining reliability of infrastructure.
  • Participate in on-call rotations and manage large-scale production systems.

Join WhatsApp Channel