Job Description
Inviting applications for the role of Principal Consultant , SRE Engineer
As the need for reliability grows with every critical application that we add to our landscape, we seek for an experienced SRE to help our tribe become more reliability driven by setting up the right design patterns and processes to make sure we have an observable, reliable and if possible self-healing product landscape. With the experience you bring, you will play a vital role in managing and leading a team while also keeping them motivated.
Responsibilities- Scope
Monitoring the performance of our production systems using a host of monitoring tools
Proactively identifying and troubleshooting issues such as software bugs, misconfigurations, performance bottlenecks and coordinating the fix of those issues
Increasing availability and reliability of our production systems
Coordinating Chaos Testing
Constantly running technical state health assessments on production infrastructure and systems to identify CIs deviating from baseline
Actively monitoring SLAs and ensuring that services perform within promised SLAs
Holding IT Engineering, Security and Architecture accountable for the remediation of any SLA degradation
Ensuring that IT is 'in CONTROL' by holding IT groups accountable for adherence
Collating and providing necessary evidence to Auditors for these controls
Architecting, creating and automatically managing an army of 'runners or bots' that fully automate tasks across infrastructure and applications - e.g. extracting production data, generating production reports, trigger event responses etc.
Identifying and automating manual operational tasks
Building and integrating tools that will assist in improving system availability, reliability and performance
Coordinating incident management and service restoration.
SREs are part of the on-call team of engineers that support production systems.
Work with BizDevOps squads on post mortems & assist in identifying and fixing reliability issues
Plan and Manage Disaster Recovery (DR) Runbook and DR testing
Gather relevant data and provide accurate production reporting for availability, reliability, performance and capacity.
A small part of the job requires coordinating response to the occasional service request from our business partners. For e.g. if a business unit requests restore of a particular backup
Qualifications we seek in you!
Minimum Qualifications
BTECH
Preferred Qualifications/ Skills
Proven track record of setting up processes in the organisation.
Expert level knowledge in Windows systems administration including events/services and asp.net/.net core applications running on iis
Proficient level knowledge in Unix/Linux administration
Proficient level knowledge in tomcat application administration
Expert level knowledge in powershell scripting and automation
Proficient level knowledge in networking concepts and Windows networking
Proficient level knowledge in monitoring and observability implementations especially on Windows stack
Proficient level knowledge in RDBMS concepts and performance management (Oracle/MSSQL)
Proficient level knowledge on Azure pipelines (other CI/CD solutions can also be considered)
Proficient level knowledge on containerization and container orchestration technologies (K8S/Openshift )
Expert level understanding of core SRE concepts, SLI/SLA/Error budgets and experience in how to implement various SRE models within the organization
Nice to haves:
Experience in leading a team for a brand-new application and platform.
Experience in setting up an agile platform to quickly adapt to a rapidly growing platform.
Experience in ELK stack and Prometheus+Grafana
Experience in incident/problem management, and being on-call
Experience in enterprise scheduling solutions (TWSd,UAC)
Experience in python scripting
Experience in Ansible
Experience in Infrastructure as code (Terraform/Pulumi), everything as code and of course git
Additional optional criteria which will be considered separately as a 'plus':
Strong plus - Experience in applying SRE concepts to Front office banking processes (e.g. Credit Approval, Monitoring, etc)
MSc or PhD with excellent academic results in the field of Computer Science, Mathematics, Engineering, Econometrics or similar.
You have a crystal-clear understanding of Agile WoW
You actively keep up to date with the latest developments and to incorporate them in your work
You embody our orange code in your professional manner: you take on activities and responsibilities and make them happen, you help others to be successful, and you are one step ahead in anticipating your colleagues and stakeholders.
