OR

Software Developer 3

Oracle
Bangalore3-12 LPA Posted 24 Oct 2025
FULL TIME
Docker
Microservices
Kubernetes
Prometheus
Java

Job Description

Basic Qualifications:

  • BS or MS degree in CS or related engineering or science field with 3-5+ years of relevant experience
  • Experience with benchmarking and troubleshooting or optimizing performance of a system.
  • Experience with coding, scripting, and automation.
  • Background in Networking.
  • General Linux skills.
  • Demonstrated ability to lead complex projects, independently resolve ambiguity, collaborate with stakeholders across teams, and communicate effectively.

Desired qualifications:

  • Experience working on clusters, e.g., running HPC/AI workloads, or maintaining an HPC/AI system.
  • Experience troubleshooting or tuning performance on distributed systems.
  • Familiarity with elements of the AI/HPC software stack such as job schedulers (e.g., Slurm); NCCL, RCCL, or MPI; or ML frameworks.
  • Experience with RDMA Networking, i.e., RoCE or Infiniband.
  • Experience architecting or developing solutions on a public cloud platform.

Responsibilities

  • Carry out performance studies on GPU clusters with focus on AIML workload performance, network performance and tuning.
  • Design and code solutions for performance benchmarking.
  • Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not fully understood systems.
  • Document new tools and procedures to a high standard.
  • Write whitepapers to disseminate findings of performance studies.
  • Participate in architecture design and review, code review, and contribute to roadmap development.
  • Mentor junior engineers.
  • Participate in operational rotations.

Join WhatsApp Channel