AP

SRE (Site Reliability Engineer)

Apex One
Pune5-8 LPA Posted 13 Oct 2025
FULL TIME
System Monitoring
Incident Response
Collaboration
Configuration
Continuous Improvement

Job Description

Roles and Responsibilities

  • Implement and manage system monitoring solutions to track health, performance, and availability
  • Identify and resolve incidents promptly to reduce downtime and customer impact
  • Lead incident response efforts and conduct root cause analysis
  • Drive continuous improvement initiatives to increase system reliability and maintainability
  • Participate in post-mortems and blameless retrospectives
  • Collaborate closely with development, operations, and other cross-functional teams
  • Maintain configuration management for various applications and systems
  • Implement comprehensive service monitoring (dashboards, metrics, alerts)
  • Define, measure, and achieve service level objectives (uptime, performance, incidents)
  • Support high-quality product development and release in partnership with stakeholders

Join WhatsApp Channel