Site Reliability Engineer
Ford Motor Company
Office
Mexico
Full Time
Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience.
Key Responsibilities Include:
- Continuously monitoring the availability, reliability, and performance of systems, platforms, and applications, maintaining a holistic view of system health.
- Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.
- Providing primary operational and engineering support for multiple large, distributed software applications.
- Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans.
- Using automation tools, scripts, and processes to reduce or eliminate repetitive tasks, thereby improving the support provided by Site Reliability Engineering.
- Creating or modifying terraform files according to Ford formats to develop new monitoring dashboards and alert policies.
Basic Qualifications:
- Bachelor’s degree in computer science, engineering, mathematics or equivalent experience.
- 3+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role.
- 3+ years of experience with Python, Java, C/C++, Ruby, and JavaScript
- 3+ years of experience with J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8, RESTful APIs and microservices platform
- 3+ years of experience with any of APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.
Preferred Qualifications:
- Strong experience with monitoring and observability tools, particularly Dynatrace and OpenTelemetry or other tools.
- Experience using Cycode APSM.
- Proficient with cloud services, with a strong preference for Google Cloud Platform (GCP) experience.
- Solid programming skills in Java, Golang, or other programming languages, with a good understanding of software development best practices.
- Experience with relational and document databases.
- Familiarity with front-end development frameworks, particularly React.
- Ability to debug, optimize code, and automate routine tasks.
- Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.
- Excellent verbal and written communication skills.
Site Reliability Engineer
Office
Mexico
Full Time
July 18, 2025