Senior Network Engineer
Infrastructure operations · shared across customers
Reports to: Director, Network Engineering
Location: Remote (US) or Pleasanton, CA (hybrid)
Department: Infrastructure & DC Operations / Network Engineering
Position summary
The Senior Network Engineer designs, deploys, and operates the high-performance networking fabric supporting GPU clusters. This includes InfiniBand and RoCE fabrics for training workloads, customer-facing connectivity, and the wide-area network that connects STN sites and customer environments.
Key responsibilities
• Design and configure InfiniBand or RoCE fabrics optimized for GPU training and distributed inference
• Configure and operate switching, routing, and customer VLAN/VRF/VPC architectures
• Manage BGP peering, public IP space, anycast, and DDoS protection
• Design customer connectivity including cross-connects, dedicated links, VPN, and SD-WAN
• Maintain network automation, configuration management, and source-of-truth tooling
• Coordinate with the NOC on network monitoring, alerting, and runbook authoring
• Troubleshoot complex network issues across layers 1 through 7
• Maintain network documentation, diagrams, and operational runbooks
• Drive network capacity planning aligned to fleet growth and customer commitments
• Support security and compliance audits including SOC 2 and customer security reviews
Required qualifications
• 7+ years in network engineering with data center or service provider experience
• Deep expertise in InfiniBand or RoCE (RoCEv2), including congestion control and NCCL tuning
• Strong knowledge of BGP, OSPF, MPLS, VXLAN, and EVPN
• Hands-on experience with Arista, NVIDIA Mellanox/Spectrum, or Cisco platforms
• CCIE, JNCIE, NCIE, or equivalent advanced certification strongly preferred
Preferred qualifications
• GPU cluster networking experience at multi-thousand-GPU scale
• SDN and automation skills (Ansible, Python, Nautobot, or Netbox)
• Multi-site WAN and peering experience including IX participation
Familiarity with NVIDIA Cumulus, SONiC, or open networking stacks
Secure, production-grade GPU cloud for AI teams. SOC 2 & HIPAA compliant with 99.999% uptime, no noisy neighbors, and expert human support.
Key team members

Sabur Mian

Christopher Chua

Trevor Walker

Tom Genn
Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.