Site Reliability Engineer (SRE)
Job Description
• Design, build, and maintain scalable and reliable SaaS-based infrastructure systems
• Manage and support microservices-based architectures in production environments
• Implement infrastructure as code using tools like Terraform, CloudFormation, or Pulumi
• Deploy and manage applications on AWS cloud infrastructure following Well-Architected Framework principles
• Work with Kubernetes for container orchestration and service reliability
• Automate infrastructure provisioning, deployment, and monitoring processes
• Ensure high availability, performance, and scalability of production systems
• Collaborate with development teams to improve system reliability and operational efficiency
• Monitor systems, identify issues proactively, and implement corrective actions
• Participate in incident management and root cause analysis for production issues
• Work in Agile environments and follow SDLC best practices
