Senior AI Infrastructure & Platform Engineer - Riyadh,KSA
Posted 7 days ago
OfficeNew Delhi, Delhi, IndiaSE
Role Overview
We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.
Key Responsibilities
- Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
- Manage and operate GPU orchestration tools and platforms such as:
- Nvidia Base Command Manager (critical)
- Nvidia AI Enterprise Suite
- Nvidia GPU and Network Operators
- Nvidia NIMs and Blueprints
- Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
- Slurm (critical)
- Vanilla Kubernetes
- Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
- Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
- Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
- Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
- Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
Requirements
Required Skills & Experience
- Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
- Hands-on experience with:
- Nvidia Base Command Manager
- Nvidia AI Enterprise Suite
- Nvidia GPU/Network Operators, NIMs, Blueprints
- Strong experience with Slurm and/or Kubernetes orchestration.
- Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
- Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
- Excellent troubleshooting and performance-tuning skills.
- Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
- Strong understanding of networking, security, resource allocation, and cluster management best practices.
Preferred Qualifications
- Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
- Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
- Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
- Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.
Other open roles at DeepSource Technologies(6)
Senior DevOps Engineer- 6 Month Project- Riyadh, KSA
Dubai, Dubai, United Arab Emirates
On-siteSenior Automation & Quality Engineer- 6 Month Project- Riyadh, KSA
Dubai, Dubai, United Arab Emirates
On-siteSr. Storage Engineer - PMax, Isilon, Vxrail - AlKhobar, KSA
New Delhi, Delhi, India
On-siteSenior Cybersecurity Engineer - VM - Saudi Nationals - Jeddah, KSA
Dammam, Eastern Province, Saudi Arabia
On-siteCloud Infrastructure Automation Engineer - Riyadh, KSA
Islamabad, Islamabad Capital Territory, Pakistan
On-siteDeepSource Technologies
View company pageAI Code Review Platform with hybrid static + AI analysis. 82% accuracy on real vulnerabilities.
Key team members

Sanket S.

Jai Pradeesh

Choudhary Sourya V.

Vishnu Jayadevan
Apply smarter with Jobr
Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.
Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups