
Lead Site Reliability Engineer | Production Infrastructure
Jump Trading
Posted about 8 hours ago
Jump Trading Group is committed to world class research. We empower exceptional talents in Mathematics, Physics, and Computer Science to seek scientific boundaries, push through them, and apply cutting edge research to global financial markets. Our culture is unique. Constant innovation requires fearlessness, creativity, intellectual honesty, and a relentless competitive streak. We believe in winning together and unlocking unique individual talent by incenting collaboration and mutual respect. At Jump, research outcomes drive more than superior risk adjusted returns. We design, develop, and deploy technologies that change our world, fund start-ups across industries, and partner with leading global research organizations and universities to solve problems.
CORE (Central Ops and Reliability Engineering) is the Production Infrastructure team responsible for operating and improving Jump’s production trading environment. The team combines deep operational ownership with software and reliability engineering practices to support production systems, drive incident and change management, improve observability and deployment workflows, and reduce operational toil across a fast-moving global trading platform.
What You’ll Do:
As Lead Site Reliability Engineer in CORE, you will both manage and mentor engineers across teams and contribute directly to key projects, balancing leadership responsibilities with hands-on work.
- Design & Build: Architect and implement high-performance monitoring and alerting systems, real-time packet/flow analysis tooling, and automation frameworks for managing Jump’s global production footprint.
- Lead Operational Maturity: Oversee and improve incident management, change management, and post-incident review processes to increase resilience and reduce downtime.
- Drive Efficiency: Identify and eliminate sources of operational toil through automation and tooling.
- Collaborate Globally: Partner with engineering, networking, and trading teams in multiple regions to align technical priorities with business objectives.
- Debug Deeply: Investigate low-level performance issues across complex software stacks, optimizing for ultra-low latency and high throughput.
- Shape the Roadmap: Influence the strategic direction of production tooling, infrastructure scaling, and vendor partnerships.
Skills You’ll Need:
- Proven leadership experience having managed people across distributed teams.
- Demonstrated history of solving reliability challenges in large-scale production environments.
- Previous experience demonstrating strategic thinking skills and maturity in tackling complex problems, dealing with people, technology and processes.
- Strong programming skills in Python, Go, or equivalent.
Benefits
Job details
Jobr Assistant extension
Get the extension →