Director, Site Reliability Engineering
Job Description
As Director of Site Reliability Engineering, you will lead multiple SRE teams across Noida and Bangalore, managing multi-tiered leaders reporting to you. You will play a pivotal role in:
• Driving system reliability, scalability, and performance for Adobe's solutions.
• Owning the technical direction, automation, monitoring, and infrastructure provisioning.
• Collaborating with engineering, product, and operations teams to drive innovation and reliability at scale.
What you'll do-
• Leadership & Strategy: Develop and execute the SRE roadmap to ensure high availability (99.99%+ uptime), scalability, and reliability of Adobe's products
• Operational Excellence: Define and implement best practices for observability, monitoring, and incident response, leveraging advanced AI/ML-powered analytics.
• Automation & Infrastructure: Drive automation initiatives for CI/CD, infrastructure provisioning, and self-healing capabilities to reduce toil and increase efficiency.
• Incident Response & Performance Optimization: Establish proactive incident management processes, conduct blameless postmortems, and continuously improve system resilience.
• Cloud & Big Data Technologies: Optimize Adobe's cloud-native architectures (AWS, Azure, GCP) and integrate big data technologies such as Hadoop, Spark, Kafka, and Cassandra.
• Cross-functional Collaboration: Work closely with product management, marketing, customer success, and global consulting teams to align business goals with engineering efforts.
• Customer Engagement: Partner with enterprise clients on pre-sales and post-sales engagements, providing technical guidance and reliability best practices.
• Team Development & Mentorship: Build and mentor a world-class SRE team, fostering a culture of innovation, ownership, and operational excellence.
What you need to succeed-
• 18+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, with at least 8 years in leadership roles.
• Proven track record of leading large-scale, high-impact engineering projects in a global enterprise.
• Experience managing multiple teams (4+ years as a second-level manager).
• Prior experience working with US-based leadership; previous work experience in the US is a plus.
• Strong expertise in distributed systems, microservices, cloud platforms (AWS/Azure/GCP), and container orchestration (Kubernetes, Docker, ECS).
• Hands-on experience with monitoring & observability tools (Datadog, Prometheus, ELK, OpenTelemetry).
• Deep understanding of SLOs, SLIs, SLAs, and error budgets to drive service reliability.
• Excellent stakeholder management skills, with the ability to collaborate across engineering, business, and customer-facing teams.
• A strategic thinker with intellectual curiosity about products, market trends, and business growth.
• Strong communication, analytical, and problem-solving skills with the ability to influence C-suite executives.
• B.Tech / M.Tech in Computer Science from a premier institute.
