Network Observability Manager– Site Reliability
Ford Motor Company.com
Office
Dearborn, MI, United States
Full Time
Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. As a Network Observability Manager – Site Reliability, you will manage a team of Network Observability SREs, own network observability tools and deliver a high-quality experience for your customers across Ford.
- Lead and manage a team of Network Observability SREs to design, build and maintain comprehensive network monitoring solutions.
- Own and evolve the Network Observability ecosystem; including tools such as SevOne, IBM CloudPak, Netcool and ThousandEyes.
- Drive adoption of SRE best practices including SLIs, SLOs, alert tuning, incident management and post-mortem analysis within the network operations function.
- Collaborate cross-functionally with SRE, Infrastructure, Cloud and security teams to ensure visibility and reliability across hybrid environments.
- Define and enforce standards for logs, metrics, traces and telemetry across network hardware and platforms.
- Support incident detection and troubleshooting efforts by providing best-in-class observability tools to reduce MTTX and proactively identify potential issues.
- Ensure observability integrations for all key networking hardware including switches, routers, firewalls and load balancers.
- Report on key performance indicators and lead initiatives to improve network reliability and visibility.
- Manage and grow engineers by providing regular mentorship, career coaching and performance feedback.
- Evaluate and recommend new and emerging products and technologies.
- Provide thought leadership and perspective across multiple organizations to eliminate knowledge silos.
- Drive continuous improvement and build a learning organization.
You’Ll Have…
- Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering or related field or equivalent work experience
- 7+ years of hands-on experience as a Network Engineer, with deep understanding of enterprise network architecture and operations.
- 5+ years of proven experience in a management role, with a focus on coaching, development, and performance management of engineering teams.
- Strong command of networking principles and protocols: routing (OSPF, BGP), ARP, VLANs, ACLs, SNMP, QoS, etc.
- Familiarity with major networking hardware vendors (Cisco, Juniper, Palo Alto, F5, etc.).
- Deep understanding of SRE principles including SLAs, SLIs, SLOs, error budgets and incident lifecycle management.
- Experience in hybrid (on-prem/cloud) environments and integrating observability across both.
- Excellent problem-solving skills, an analytical mindset and the ability to manage and prioritize multiple tasks effectively.
- Strong communication and collaboration skills to work across both technical and non-technical teams.
Even better, you may have...
- Certifications such as CCNP/CCIE, AWS Advanced Networking, or Certified SRE.
- Experience with automation and scripting (e.g., Python, Ansible) for network telemetry and monitoring.
- Familiarity with cloud-native monitoring solutions (e.g., Datadog, Grafana, Splunk, Prometheus).
- Extensive experience creating architectures which support reactive, distributed, secure, performant, service-oriented systems
- Strong verbal and written communications skills with the ability to influence the enterprise
Network Observability Manager– Site Reliability
Office
Dearborn, MI, United States
Full Time
September 11, 2025