As Director of Site Reliability Engineering, you will lead multiple SRE teams across Noida and Bangalore, managing multi-tiered leaders reporting to you. You will play a pivotal role in:

• Driving system reliability, scalability, and performance for Adobe's solutions.

• Owning the technical direction, automation, monitoring, and infrastructure provisioning.

• Collaborating with engineering, product, and operations teams to drive innovation and reliability at scale.

What you'll do-

• Leadership & Strategy: Develop and execute the SRE roadmap to ensure high availability (99.99%+ uptime), scalability, and reliability of Adobe's products

• Operational Excellence: Define and implement best practices for observability, monitoring, and incident response, leveraging advanced AI/ML-powered analytics.

• Automation & Infrastructure: Drive automation initiatives for CI/CD, infrastructure provisioning, and self-healing capabilities to reduce toil and increase efficiency.

• Incident Response & Performance Optimization: Establish proactive incident management processes, conduct blameless postmortems, and continuously improve system resilience.

• Cloud & Big Data Technologies: Optimize Adobe's cloud-native architectures (AWS, Azure, GCP) and integrate big data technologies such as Hadoop, Spark, Kafka, and Cassandra.

• Cross-functional Collaboration: Work closely with product management, marketing, customer success, and global consulting teams to align business goals with engineering efforts.

• Customer Engagement: Partner with enterprise clients on pre-sales and post-sales engagements, providing technical guidance and reliability best practices.

• Team Development & Mentorship: Build and mentor a world-class SRE team, fostering a culture of innovation, ownership, and operational excellence.

What you need to succeed-

• 18+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, with at least 8 years in leadership roles.

• Proven track record of leading large-scale, high-impact engineering projects in a global enterprise.

• Experience managing multiple teams (4+ years as a second-level manager).

• Prior experience working with US-based leadership; previous work experience in the US is a plus.

• Strong expertise in distributed systems, microservices, cloud platforms (AWS/Azure/GCP), and container orchestration (Kubernetes, Docker, ECS).

• Hands-on experience with monitoring & observability tools (Datadog, Prometheus, ELK, OpenTelemetry).

• Deep understanding of SLOs, SLIs, SLAs, and error budgets to drive service reliability.

• Excellent stakeholder management skills, with the ability to collaborate across engineering, business, and customer-facing teams.

• A strategic thinker with intellectual curiosity about products, market trends, and business growth.

• Strong communication, analytical, and problem-solving skills with the ability to influence C-suite executives.

• B.Tech / M.Tech in Computer Science from a premier institute.

Director, Site Reliability Engineering

Job Description

Required Skills