Job Description
The High-Performance Computing Storage Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customers environments. Our Storage Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers.
Key Responsibilities
- Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
- Plan and perform maintenance activities
- Assess customer environments for performance and design issues and propose resolutions
- Work across technical teams to troubleshoot complex infrastructure issues
- Create and maintain detailed documentation
- Serve as a subject matter expert and escalation point for storage technologies
- Work with vendors to resolve storage issues
- Communicate with customers and internal team with transparency
- Participate in on-call rotation
- Completion of training and certification as assigned to further skills and knowledge
Skills Required
- Bachelor s degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education
- 5+ years of expert level experience managing storage infrastructure in high-performance computing environments including, file systems, storage appliances, and data workflows.
- Experience configuring, maintaining, and tuning Ceph clusters.
- Experience configuring, maintaining, and tuning distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)
- Experience with InfiniBand networking preferred.
- 1+ years working with monitoring platforms; Elastic Observability is a bonus
- 1+ years working with an enterprise ITSM system: Service Now is a bonus
- Familiarity with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) and their interaction with data storage systems.
- Understanding of data protection mechanisms, including data replication, backup strategies, and disaster recovery in HPC environments.
- Experience with containerization (Docker, Singularity) in an HPC context for data processing and application deployment.
- Solid working knowledge or Linux and scripting a plus.
- Experience with machine learning or data science workflows in HPC environments a plus.
- Managed Services or consulting experience is required.
- Strong background with customer service
- High level problem-solving and communication skills
- Strong oral and written communications skills
- Related Storage certifications are a bonus.
Why AHEAD:
- Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.
- We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.
USA Employment Benefits include:
- Medical, Dental, and Vision Insurance
- 401(k)
- Paid company holidays
- Paid time off
- Paid parental and caregiver leave
- Plus more! See benefits https://www.aheadbenefits.com/ for additional details.
The compensation range indicated in this posting reflects the On-Target Earnings ( OTE ) for this role, which includes a base salary and any applicable target bonus amount. This OTE range may vary based on the candidate s relevant experience, qualifications, and geographic location.
