AD

Director, Site Reliability Engineering

Adobe
Noida18-20 LPA Posted 16 Apr 2025
FULL TIME
Devops
Datadog
infrastructure engineer
Aws

Job Description

As Director of Site Reliability Engineering, you will lead multiple SRE teams across Noida and Bangalore, managing multi-tiered leaders reporting to you. You will play a pivotal role in: 

• Driving system reliability, scalability, and performance for Adobe's solutions. 

• Owning the technical direction, automation, monitoring, and infrastructure provisioning. 

• Collaborating with engineering, product, and operations teams to drive innovation and reliability at scale. 

What you'll do-

• Leadership & Strategy: Develop and execute the SRE roadmap to ensure high availability (99.99%+ uptime), scalability, and reliability of Adobe's products 

• Operational Excellence: Define and implement best practices for observability, monitoring, and incident response, leveraging advanced AI/ML-powered analytics. 

• Automation & Infrastructure: Drive automation initiatives for CI/CD, infrastructure provisioning, and self-healing capabilities to reduce toil and increase efficiency. 

• Incident Response & Performance Optimization: Establish proactive incident management processes, conduct blameless postmortems, and continuously improve system resilience. 

• Cloud & Big Data Technologies: Optimize Adobe's cloud-native architectures (AWS, Azure, GCP) and integrate big data technologies such as Hadoop, Spark, Kafka, and Cassandra. 

• Cross-functional Collaboration: Work closely with product management, marketing, customer success, and global consulting teams to align business goals with engineering efforts. 

• Customer Engagement: Partner with enterprise clients on pre-sales and post-sales engagements, providing technical guidance and reliability best practices. 

• Team Development & Mentorship: Build and mentor a world-class SRE team, fostering a culture of innovation, ownership, and operational excellence. 

What you need to succeed-

• 18+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, with at least 8 years in leadership roles. 

• Proven track record of leading large-scale, high-impact engineering projects in a global enterprise. 

• Experience managing multiple teams (4+ years as a second-level manager). 

• Prior experience working with US-based leadership; previous work experience in the US is a plus. 

• Strong expertise in distributed systems, microservices, cloud platforms (AWS/Azure/GCP), and container orchestration (Kubernetes, Docker, ECS). 

• Hands-on experience with monitoring & observability tools (Datadog, Prometheus, ELK, OpenTelemetry). 

• Deep understanding of SLOs, SLIs, SLAs, and error budgets to drive service reliability. 

• Excellent stakeholder management skills, with the ability to collaborate across engineering, business, and customer-facing teams. 

• A strategic thinker with intellectual curiosity about products, market trends, and business growth. 

• Strong communication, analytical, and problem-solving skills with the ability to influence C-suite executives. 

• B.Tech / M.Tech in Computer Science from a premier institute. 

Join WhatsApp Channel