AP

Senior Site Reliability Engineer

Apptad
Mysore5-10 LPA Posted 10 Apr 2026
FULL TIME
Kubernetes
Prometheus
Cloud Infrastructure
AWS EKS

Job Description

Key Responsibilities

Kubernetes & EKS Platform Engineering

  • Architect, deploy, and operate production-grade Kubernetes clusters on AWS EKS
  • Implement and manage EKS automation using EKS Blueprints and lifecycle management of add-ons
  • Plan and execute Kubernetes and EKS version upgrades with minimal service disruption

Autoscaling & Compute Optimization

  • Design and implement Karpenter-based autoscaling solutions for dynamic workload scaling
  • Optimize compute resources for cost efficiency, performance, and high availability

Service Mesh & Traffic Management

  • Design and operate Istio service mesh (including sidecar and ambient mesh models)
  • Implement advanced traffic management policies such as mTLS, retries, circuit breaking, and timeouts

Security, Policy & Runtime Protection

  • Implement Kubernetes governance using Kyverno and OPA/Gatekeeper
  • Operate Falco for runtime threat detection and security incident investigation
  • Integrate security and compliance controls into GitOps workflows

Infrastructure as Code & Automation

  • Build and maintain reusable Terraform modules for AWS infrastructure (VPC, EKS, Transit Gateway, etc.)
  • Implement Terragrunt-based multi-account and multi-region infrastructure setups
  • Drive automation to reduce manual operations and improve scalability

GitOps & Platform Operations

  • Design and manage Argo CD for GitOps-based deployment and platform operations
  • Define Git-based promotion workflows and access control models across environments

Observability & SRE Practices

  • Design and maintain monitoring and alerting systems using Prometheus
  • Participate in incident response, root cause analysis, and reliability engineering improvements
  • Reduce operational toil through automation and self-service capabilities

Security & Compliance

  • Own remediation of security findings from tools such as Wiz across AWS and Kubernetes environments
  • Collaborate with security teams to implement preventive security guardrails and best practices
Join WhatsApp Channel