HPC - Team Lead
SHI | Locuz.com
Office
Telangana, Madhapur, India
Full Time
Hi,
We have an immediate requirement for HPC Team Lead position in Hyderabad with our organization SHI Locuz Enterprise Solutions Pvt Ltd.
PFB JD:
Experience - 6+yearsWork location - Hyderabad
ROLE SUMMARY The Technology Lead – HPC ensures that critical IT services and high-performance computing (HPC) infrastructure are available, efficient, and secure. The person in this role manages daily operations of mission‐critical systems in multiple client’s data centres, working closely with both facilities engineering teams (power, cooling, physical infrastructure) and IT infrastructure / operations teams, to support service clients around the clock. This role combines technical leadership, operations oversight, incident / problem management, and strategic planning.
PRIMARY ROLES & RESPONSIBILITIES
TECHNICAL SKILLS
SOFT SKILLS
OTHER SKILLS
We have an immediate requirement for HPC Team Lead position in Hyderabad with our organization SHI Locuz Enterprise Solutions Pvt Ltd.
PFB JD:
Experience - 6+yearsWork location - Hyderabad
ROLE SUMMARY The Technology Lead – HPC ensures that critical IT services and high-performance computing (HPC) infrastructure are available, efficient, and secure. The person in this role manages daily operations of mission‐critical systems in multiple client’s data centres, working closely with both facilities engineering teams (power, cooling, physical infrastructure) and IT infrastructure / operations teams, to support service clients around the clock. This role combines technical leadership, operations oversight, incident / problem management, and strategic planning.
PRIMARY ROLES & RESPONSIBILITIES
- Experience architecting and maintaining HPC/AI systems.
- Linux system administration
- Cluster management
- System and software configuration management
- High speed networking
- Resource managers and schedulers
- High speed parallel storage
- Monitoring and alerting
- Strong understanding of HPC/AI architectures and concepts.
- Experience supporting and managing a group of HPC/AI Clusters.
- Excellent knowledge in prototyping and deploying HPC/AI clusters.
- Extensive experience in troubleshooting Linux OS, filesystems and cluster hardware.
- Good command of various Linux scripting tools, like bash, Perl, python, etc.
- Experience implementing, maintaining, and verifying defined security policies.
- To be willing to maintain a flexible work schedule.
- A positive attitude and willingness to help enable the lab users for success.
- Excellent guidance and teamwork skills.
TECHNICAL SKILLS
- RedHat, Ubuntu, SuSE OS
- Cluster Tools (Bright, xCAT, werewolf, OpenHPC, ROCKS etc)
- InfiniBand
- Lustre, BeeGFS and GPFS architecture and maintenance
- Configuration management software (Ansible, Puppet)
- SLURM/PBS/LSF/Gridengine Scheduler
- SPACK software manager
- Experience in AI Servers & Software stack Deployment.
- Experience on container technologies and orchestration tools - docker, singularity, Apptainer, Kubernetes.
- Hands-on with AI/ML tools: TensorFlow, PyTorch, Keras, ONNX, JAX.
- Experience in benchmarking and performance optimization of large-scale HPC/AI systems
- Experience in Linux, and/or Windows Operating System (OS), including file management, scripting, editing, and security.
- Log consolidation and monitoring (ganglia, Grafana etc.)
- Lifecycle and patch management experience.
SOFT SKILLS
- Good logical reasoning & analytical skill
- Good communication skill
OTHER SKILLS
- Collaborative, co-operative, and commitment mindset.
- Teamwork
- Excellent analytical and problem-solving skills.
- Ability to work independently and within cross-functional teams.
- Detail-oriented with good documentation practices.
- Excellent interpersonal, communication, customer interaction, documentation skills and decision-making ability.
HPC - Team Lead
Office
Telangana, Madhapur, India
Full Time
September 30, 2025