STN Inc logo

Senior Platform Engineer

Posted 12 days ago

RemoteRemoteSE

Senior Platform Engineer

Platform and software · shared across customers

Reports to: Director, Platform Engineering (or Chief Architect)

Location: Remote (US) or Pleasanton, CA (hybrid)

Department: Cloud Platform Engineering / GPU Platform Engineering

Position summary

The Senior Platform Engineer builds and operates the multi-tenant orchestration, scheduling, and customer-facing platform layer that turns raw GPU infrastructure into a usable cloud service. This role is the software backbone of GPU One (GPUaaS).

Key responsibilities

  • Design and build the orchestration layer (Kubernetes, Slurm, Run:ai, or comparable)

  • Manage multi-tenant isolation including namespaces, networking, storage, and quotas

  • Build customer-facing platform APIs, CLIs, web portals, and SDKs

  • Implement and operate image management, GPU operator, and node provisioning automation

  • Drive infrastructure-as-code and automation across the platform stack

  • Partner with SRE on platform reliability, SLO definition, and observability

  • Support TAM and Support engineers on customer-impacting platform issues

  • Maintain customer environment templates, configuration management, and rollout tooling

  • Participate in architecture review, design discussions, and technical roadmap

  • Drive continuous platform improvement and reduce operational toil

Required qualifications

  • 6+ years in platform engineering, SRE, or cloud engineering at scale

  • Deep Kubernetes expertise including CRDs, operators, and multi-tenant patterns

  • Strong programming skills in Go, Python, or both

  • Experience operating GPU clusters or AI infrastructure at production scale

  • Bachelor's degree in computer science or equivalent experience

Preferred qualifications

  • Experience with NVIDIA GPU Operator, MIG, MPS, and NCCL operator patterns

  • Familiarity with Slurm operator, Run:ai, KubeRay, or comparable AI orchestration

  • Service mesh experience (Istio, Linkerd) and multi-cluster networking

  • Open source contributions in the cloud-native or AI infrastructure ecosystem

Job details
Workplace
Remote
Location
Remote
Experience
SE

Secure, production-grade GPU cloud for AI teams. SOC 2 & HIPAA compliant with 99.999% uptime, no noisy neighbors, and expert human support.

Employees
83
Industry
IT Services and IT Consulting
Headquarters
Pleasanton, California
Founded
2016
Specialties
Managed Services, SOC2 Certified, Cyber Security, Risk Assessments, HIPAA, Compliance, Managed SIEM, Backup, Recovery, Incident Response, Ransomware Prevention, Penetration Testing, Social Engineering, Network Engineering, and VAR Reseller

Key team members

Sabur Mian

Sabur Mian

Christopher Chua

Christopher Chua

Trevor Walker

Trevor Walker

Tom Genn

Tom Genn

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups