STN Inc logo

Hardware Engineer

Posted 14 days ago

RemoteRemote

Hardware Engineer

Infrastructure operations · shared across sites

Reports to: Director, Hardware Engineering

Location: Pleasanton, CA (hybrid) or assigned site; travel up to 25%

Department: Infrastructure & DC Operations / Systems Engineering

Position summary

The Hardware Engineer owns hardware lifecycle for GPU and supporting infrastructure assets, including fleet health monitoring, RMA workflows, firmware management, and long-range capacity planning. The role is the technical owner of the physical compute platform.

Key responsibilities

  • Monitor GPU and server health including thermal, error rates, and component failures

  • Drive the RMA process with vendors (NVIDIA, Supermicro, HPE, and others) end-to-end

  • Manage firmware, BIOS, and BMC upgrade campaigns across the fleet

  • Develop hardware burn-in and acceptance test procedures, including NCCL and stress tests

  • Investigate hardware failures and produce vendor-grade root cause analyses

  • Maintain hardware inventory, asset records, and CMDB accuracy

  • Drive capacity planning across compute, storage, and networking

  • Coordinate with Procurement on spare parts strategy and stocking levels

  • Author hardware engineering runbooks and operational procedures

  • Support new platform bring-up, qualification, and reference architecture validation

Required qualifications

  • 5+ years in hardware engineering, systems engineering, or data center engineering

  • Deep knowledge of x86 server architecture, GPU systems, and modern storage

  • Hands-on experience with NVIDIA HGX, DGX, or hyperscale-class systems

  • Strong Linux fundamentals and scripting skills (Python, Bash)

  • Bachelor's degree in computer science, electrical engineering, or related field

Preferred qualifications

  • Experience with NVIDIA Mission Control, Base Command Manager, or Bright Cluster Manager

  • Familiarity with IPMI, Redfish, and vendor management interfaces

  • Knowledge of liquid cooling and high-density power architectures

  • Experience operating fleets of 1,000+ GPUs

Job details
Workplace
Remote
Location
Remote

Secure, production-grade GPU cloud for AI teams. SOC 2 & HIPAA compliant with 99.999% uptime, no noisy neighbors, and expert human support.

Employees
83
Industry
IT Services and IT Consulting
Headquarters
Pleasanton, California
Founded
2016
Specialties
Managed Services, SOC2 Certified, Cyber Security, Risk Assessments, HIPAA, Compliance, Managed SIEM, Backup, Recovery, Incident Response, Ransomware Prevention, Penetration Testing, Social Engineering, Network Engineering, and VAR Reseller

Key team members

Sabur Mian

Sabur Mian

Christopher Chua

Christopher Chua

Trevor Walker

Trevor Walker

Tom Genn

Tom Genn

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups