Crusoe logo

Senior Engineering Manager, Management Plane Systems

Crusoe

Posted about 10 hours ago

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.

We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.

We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.

If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

About the Role:

As we scale our AI infrastructure, we are investing deeply in the software systems that manage, observe, and heal our network at scale. We are hiring a Senior Engineering Manager, SDN Management Plane to lead the team responsible for the automation, observability, configuration management, and policy enforcement layer that runs across our entire network fleet.

This is a senior software engineering leadership role. The Management Plane is the horizontal layer that ties together our control and data plane systems, making our network self-aware, self-healing, and continuously verifiable. You will lead a team of senior and staff software engineers while remaining deeply engaged in platform architecture, systems design, and the technical roadmap.

This is not a network operations or SRE role. It is a platform engineering leadership position where your primary output is software: automation systems, observability pipelines, configuration management platforms, and the tooling that eliminates manual toil at scale. You will apply sound software engineering principles to hard networking problems, including the application of GenAI and machine learning to network operations.

What You'll Be Working On:

  • Platform Architecture & Engineering

    • Own the architecture, development, and production operation of Crusoe's SDN Management Plane, the automation and observability layer that manages our network fleet across all regions.

    • Build and operate CI/CD pipelines for network configuration: automated testing, policy validation, and push-on-green delivery of network changes from intent to production.

    • Design and implement the software systems that enforce reconciliation between declared and actual network state, detect configuration drift, and trigger automated remediation workflows.

    • Define provisioning and onboarding automation for new nodes, regions, and customer environments, ensuring consistent, policy-compliant network configuration at scale.

    Observability and Intelligent Operations

    • Drive the design of network observability systems including streaming telemetry (gNMI/gRPC), synthetic probing, anomaly detection, and real-time traffic monitoring across GPU clusters.

    • Design and implement self-healing network capabilities: closed-loop automation with appropriate guardrails that detects, diagnoses, and resolves network faults without human intervention.

    • Set the technical vision for applying GenAI and machine learning to network operations, from intelligent anomaly detection to natural-language-driven network management.

    Cross-Functional Partnership

    • Partner closely with Control Plane and Data Plane teams to ensure clean software interfaces between layers, and with infrastructure and compute teams to support GPU cluster networking requirements.

    • Act as the internal platform owner for network automation: treat other engineering teams as customers with real product requirements, not just consumers of scripts.

    People Leadership

    • Lead, mentor, and grow a team of senior and staff-level software and network automation engineers.

    • Set technical standards, review architecture and design decisions, own team performance and development.

    • Foster a high-ownership engineering culture focused on shipping production software, not just maintaining tooling.

What You'll Bring to the Team:

  • 10+ years of experience in network software engineering, network automation platform engineering, or infrastructure platform engineering.

  • 5 to 7+ years managing senior and staff-level software engineers, with demonstrated ability to build and scale a platform team.

  • Proven track record of architecting and shipping production-grade automation and observability systems, not just configuring or consuming existing tooling.

  • Deep hands-on experience building network automation platforms: architecting and owning systems that other engineering teams depend on as internal customers.

  • Strong fluency in network automation frameworks and tooling: Ansible, Nornir, Napalm, Salt, or equivalent. Proven experience building production CI/CD pipelines for network infrastructure, including test coverage, rollback logic, and policy validation.

  • Experience with network source-of-truth systems (NetBox, Nautobot, or custom CMDB) and building software-driven reconciliation loops between declared and observed network state.

  • Familiarity with network telemetry and observability systems: gNMI, gRPC streaming telemetry, OpenTelemetry, or equivalent synthetic probing and monitoring architectures.

  • Solid understanding of network protocols and SDN architectures: BGP, VXLAN, EVPN, and familiarity with control plane systems (OVN/OVS preferred) at the level needed to automate them effectively.

  • Experience with network modeling standards: YANG, Netconf, RESTCONF, or intent-based networking abstractions.

  • Strong software engineering background with fluency in Python and/or Go. Able to set code quality standards, define testing strategies, and review complex platform code at a staff engineer level.

  • Demonstrated ability to lead in fast-moving, execution-heavy environments: comfortable building from scratch, shipping iteratively, and owning production systems end-to-end.

  • Track record of managing platform teams with internal customers, able to balance roadmap commitments with operational reliability and stakeholder needs.

  • Clear platform mindset: you have built software that other teams depend on, defined its interfaces, and owned its reliability as a product.

Bonus Points

  • Experience applying GenAI, ML, or AIOps techniques to network operations: anomaly detection, predictive failure analysis, or natural-language configuration interfaces.

  • Background in AI infrastructure or GPU cluster networking environments.

  • Contributions to open-source network automation or observability projects.

  • Experience with release management and change control systems for large-scale network infrastructure.

  • Familiarity with RDMA/RoCE or high-performance networking in GPU environments.

  • P4 or programmable networking pipeline experience.

Benefits:

  • Industry competitive pay

  • Restricted Stock Units in a fast growing, well-funded technology company

  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

  • Employer contributions to HSA accounts

  • Paid Parental Leave

  • Paid life insurance, short-term and long-term disability

  • Teladoc

  • 401(k) with a 100% match up to 4% of salary

  • Generous paid time off and holiday schedule

  • Cell phone reimbursement

  • Tuition reimbursement

  • Subscription to the Calm app

  • MetLife Legal

  • Company paid commuter benefit; $300/month

Compensation Range

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Office

Location

US

Experience

SE

Salary

237k - 288k USD

per year

Similar

Jobr Assistant extension

Get the extension →