IN

Site Reliability Engineer - III

Innovaccer Analytics Private Limited
Noida5-10 LPA Posted 20 May 2025
FULL TIME
Kafka
Kubernetes
Elasticsearch
Cicd
Site Reliability Engineering
+1 more

Job Description

We at Innovaccer are looking for a Site Reliability Engineer II to build the most amazing product experience. you'll get to work with other engineers to build delightful feature experiences to understand and solve our customer s pain points. In this role you will be responsible for building/automating secure cloud Infrastructure (Infrastructure As A Code - IaaC) with various pillars Cost, Reliability, Scalability, Performance, Cost etc

 

A Day in the Life

  • In this role you will design, architect various domains of SRE.
  • You will extensively collaborate with different teams and drive various initiatives and SRE best practices adoption.
  • In this role you will be responsible for building/automating secure cloud Infrastructure (Infrastructure As A Code - IaaC) with various pillars Cost, Reliability, Scalability, Performance, Cost etc
  • Build CICD stack collaborating across Dev and QA/Automation team and drive organization to new level of (daily/hourly) continuous delivery and deployment.
  • Security is paramount to everything we do, you will work closely with CISO, Dev team(s) and make security as first class citizens. Develop S-CICD (Secure CICD), enable various security tool chains and vulnerability reports to developers via automation.
  • Observability is very critical for the scale of our systems and ability to find insights/behavior, detect problem/failures. Looking for leads to drive this charter spanning across logs, metrics, mesh, tracing etc
  • Collaborate closely with Dev and QA team to bring given initiative to a closer, increase adoption of DevOps practices and tool chain.
  • Apply strong analytical skills to understand production system metrics, drive change, optimize system utilization and drive cost efficiency.
  • Auto scale/down the platform during peak season scenarios.
  • Ensure that the Platform is secured as per guidelines established by CISO. e,g, Secure against DDoS attacks by implementing WAF, Vulnerability and Patch management, install required security agents etc
  • Lead least privilege based RBAC for various production services and tool chains.
  • Build and execute Disaster Recovery plan.
  • Key stakeholder to participate incase of IR (Incident Response).

What You Need

  • 5+ years experience as a DevOps/SRE Engineer.
  • Solid experience with at least one of the clouds with automation focus - AWS, Azure, GCP. Certification has advantages.
  • Hands-on experience with Kubernetes along with Linux.
  • Programming experience with scripting languages eg Python.
  • Build and deployment experience building scalable CICD architectures and solutions is preferred.
  • Building observability stack from logs, metrics, traces, service mesh, data observability is preferred.
  • Good at documenting and structuring documents for consumption by various dev teams.
  • Cloud Security is a major advantage and highly preferred skill.
  • Hands-on experience with a few of these - Kafka, Postgre, Snowflake etc is preferred.

Preferred Skills:

  • Multi Cloud: AWS, Azure, GCP
  • Distributed Compute: Kubernetes (EKS/AKS), Containerization
  • Persistence stores Postgres, MongoDB
  • Data Warehousing Snowflake, Data Bricks
  • Messaging Kafka
  • CICD Jenkins, ArgoCD, GitOps
  • Observability Elasticsearch, Prometheus, Jaeger, NewRelic etc
Join WhatsApp Channel