APApplexus Technologies
Site Reliability Engineering Manager
Bangalore ₹14-20 LPA Posted 15 Apr 2025
FULL TIME
Devops
Java
Python
Job Description
Job description
- As a Site Reliability Engineering (SRE) Manager, candidate will be responsible for building, developing, and retaining a high-performing team of software engineers and build an environment where they can thrive and succeed
- While the primary role is leading/managing employees, you should have deep technical knowledge on distributed systems and cloud computing, security platforms and can quickly understand and respond to peer teams needs
- It is also encouraged that you have strong experience working with short release cycles, do not hesitate to :- Actively participate in architectural and functional design, implementation and troubleshooting sessions
- - Review hardware, software infrastructure and application functionality for identifying and optimizing performance bottlenecks
- - Drive major incident management to restore order
- - Spearhead in designing and implementing comprehensive monitoring for applications, integrations and anomalies- Innovate and find opportunities and drive automation efforts across various platform and security applications
- - Working closely with Cross functional IT organization, Business group, Apples production support team, application engineers, systems engineers, database administrators and QA team to effectively ensure implementation and reliability of Platforms/Applications
- - A proven track record with managing, motivating and providing technical guidance to a team of software engineers to draw out their best work will be key to success
- - Ensuring quality in every deliverable, creative thinking, strong problem solving, and the ability to collaborate with other global cross-functional teams in a fast paced environment will be meaningful attributes to succeed in this role
- At least 10+ years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
- 3+ years of experience leading and managing high performance SRE teams.
- Proven track record in leading sophisticated SRE projects, enterprise services at a large scale
- Strong analytical, troubleshooting and problem solving skills
- Good knowledge in at least one object oriented programming language (preferably Java , Python)
- Unix Performance Monitoring & Tuning
- Good understanding of Database concepts, PL/SQL and NoSql Technologies.
- Hands on experience with monitoring and data analysis tools (e.g., Prometheus, Splunk, Grafana, Cloudwatch)
- Building and operating container orchestrating systems like Kubernetes or EKS.
- Deep understanding of security concepts and protocols - authentication, authorization, signing, encryption, SSL/TLS, SSH/SFTP, PKI, X509 certificates and PGP.
- Good fundamentals on Release Management & continuous Integration
- Familiarity with modern web services architectures, cloud platforms such as AWS, GCP, Azure and distributed storage systems (ScaleIO, Amazon S3).
- Ability to communicate with large cross-functional teams about various engineering topics such as system architecture, detailed design, APIs, project schedules etc.
- Ability to make right trade-off choices when dealing with functional complexity, conflicting priorities and aggressive schedules
- Represent the team and remove hurdles to enable each team member to operate at the highest level of efficiency and productivity
- Ability to hire, mentor and manage the performance of a large team.
- Ability to connect with senior executives and business stakeholders.
- A learning attitude to continuously improve self, team and the organisation.
- Ability to work under pressure and manage difficult situations in a fast-paced work environment.
- Bachelor or Masters or equivalent experience in Computer Science or other related field.
Preferred Qualifications
- Java and JVM technologies runtime configurations and troubleshooting is a plus
- Good fundamentals on data modelling and machine learning algorithms
- Strong knowledge on securing applications, thorough understanding of OWASP top 10 risks and solutions.
