Senior DevOps Engineer
Job Description
Must have skills required :
CI/CD, Go, Terraform, AWS, Kubernetes, Python
Good to have skills :
ArgoCD, Datadog, Splunk, Jenkins, Security
Bolo AI (One of Uplers' Clients) is Looking for:
Senior DevOps Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you.
Role Overview Description
We are seeking a skilled Senior DevOps Engineer with at least 5 years of experience to optimize and manage our infrastructure. The ideal candidate will have expertise in automating deployments, writing Infrastructure as Code (IaC), and optimizing GPU utilization within Amazon EKS pods and nodes. This role involves creating alerting and monitoring systems, participating in an on-call rotation to ensure system reliability and performance, and mentoring junior engineers.
Responsibilities
System Reliability & Performance
Gain a deep understanding of our platform architecture, codebases, and deployment workflows to ensure 99.99%+ uptime, resilience, and scalability across all production systems.
Proactively monitor system health and resolve incidents with a focus on minimizing downtime and user impact.
Infrastructure Engineering
Design, provision, and manage Kubernetes clusters in AWS EKS, optimizing for cost-efficiency, security, and performance at scale.
Leverage Infrastructure as Code (IaC) tools such as Terraform or CloudFormation to maintain infrastructure in a consistent, repeatable manner.
CI/CD & Automation
Build and maintain robust CI/CD pipelines that enable frequent, reliable, and automated software releases.
Drive automation across the stack, from provisioning and configuration to testing and deployment.
Database DevOps
Integrate DevOps best practices into database lifecycle managementautomate backups, replication, scaling, and disaster recovery for cloud-native data services.
Production Debugging & Monitoring
Lead root cause analysis and resolution of complex production issues using logging, tracing, and metrics dashboards (e.g., Prometheus, Grafana, ELK).
Establish and refine observability practices including real-time monitoring, alerts, and performance tuning.
Continuous Improvement & Innovation
Identify opportunities to optimize infrastructure, deployment processes, and tooling for better agility, performance, and developer experience.
Stay ahead of industry trends to drive technical innovation and best practices adoption across the organization.
Leadership & Ownership
Take ownership of high-impact initiatives from planning to delivery, ensuring cross-functional alignment and measurable outcomes.
Anticipate and mitigate project and infrastructure risks, aligning solutions with broader business objectives.
Collaboration & Communication
Partner closely with Engineering, Security, QA, and Product teams to deliver infrastructure and platform improvements that enable scale.
Champion a DevOps culture, promoting shared ownership, transparency, and continuous learning.
Qualifications
- Experience: 7+ years in DevOps or cloud engineering (or equivalent practical experience).
- Cloud Expertise: Deep knowledge of AWS services (compute, storage, networking, security).
- Programming: Proficiency in Go, Python, or similar languages.
- IaC & Automation: Strong experience with Terraform and infrastructure-as-code.
- Problem-Solving: Ability to tackle complex challenges, balancing trade-offs.
- Collaboration: Strong cross-functional communication and stakeholder management.
- Ownership & Adaptability: Proven ability to drive projects independently and manage shifting priorities.
Desirable Skills:
- Automation: Experience automating infrastructure tasks to increase efficiency and reduce manual effort.
- CI/CD: Experience with CI/CD pipelines and tools such as ArgoCD, Jenkins, or GitLab CI/CD.
- Monitoring: Experience with monitoring and logging tools (e.g., Datadog, Splunk).
- Risk: Experience with security and compliance in cloud environments.
- Mentorship: Experience providing guidance and mentorship to less experienced engineers
