OROracle
Software Developer 3
Bangalore ₹3-12 LPA Posted 24 Oct 2025
FULL TIME
Docker
Microservices
Kubernetes
Prometheus
Java
Job Description
Basic Qualifications:
- BS or MS degree in CS or related engineering or science field with 3-5+ years of relevant experience
- Experience with benchmarking and troubleshooting or optimizing performance of a system.
- Experience with coding, scripting, and automation.
- Background in Networking.
- General Linux skills.
- Demonstrated ability to lead complex projects, independently resolve ambiguity, collaborate with stakeholders across teams, and communicate effectively.
Desired qualifications:
- Experience working on clusters, e.g., running HPC/AI workloads, or maintaining an HPC/AI system.
- Experience troubleshooting or tuning performance on distributed systems.
- Familiarity with elements of the AI/HPC software stack such as job schedulers (e.g., Slurm); NCCL, RCCL, or MPI; or ML frameworks.
- Experience with RDMA Networking, i.e., RoCE or Infiniband.
- Experience architecting or developing solutions on a public cloud platform.
Responsibilities
- Carry out performance studies on GPU clusters with focus on AIML workload performance, network performance and tuning.
- Design and code solutions for performance benchmarking.
- Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not fully understood systems.
- Document new tools and procedures to a high standard.
- Write whitepapers to disseminate findings of performance studies.
- Participate in architecture design and review, code review, and contribute to roadmap development.
- Mentor junior engineers.
- Participate in operational rotations.
