At the Ellison Institute of Technology (EIT), we’re on a mission to translate scientific discovery into real world impact. We bring together visionary scientists, technologists, policy makers, and entrepreneurs to tackle humanity’s greatest challenges in four transformative areas:

Health, Medical Science & Generative Biology
Food Security & Sustainable Agriculture
Climate Change & Managing CO₂
Artificial Intelligence & Robotics

This is ambitious work - work that demands curiosity, courage, and a relentless drive to make a difference. At EIT, you’ll join a community built on excellence, innovation, tenacity, trust, and collaboration, where bold ideas become real-world breakthroughs. Together, we push boundaries, embrace complexity, and create solutions to scale ideas for lab to society. Explore more at www.eit.org

Requirements

Job Summary:

Our platform connects physical hardware - robotic systems, ambient sensor rigs, lab hardware - to a cloud-native data platform that captures, stores and services high-frequency multimodal data. The platform spans an edge-to-cloud telemetry stack (MQTT, Kafka, multimedia streams), REST and streaming APIs, and a growing set of hardware integrations across multiple sites.

As a Data Engineer, you’ll work directly alongside hardware engineers, scientists, and the core platform team to build the data layer that makes our autonomous operations possible. This is a hands-on, high-impact role for someone who is comfortable combining hardware data streams with reproducible, version-controlled, and well-structured data pipelines, and is comfortable bringing their own expertise into diverse groups.

Day-to-Day, You Might

Integrate new lab instruments and robotic systems into the platform - writing device APIs, defining event schemas, and validating data at the edge before it reaches cloud storage.
Build and maintain high-throughput, low-latency ingestion pipelines for streaming data: live video feeds, sensor telemetry, robot joint state, and instrument control signals.
Work across the edge-to-cloud stack - from configuring edge devices and MQTT brokers through to Kafka topics, cloud storage, and the APIs that expose data to scientists and downstream ML pipelines.
Design and evolve the common data model for hardware execution and scientific outcome data, with a strong focus on schema stability, versioning, and provenance.
Collaborate with scientists and hardware engineers to turn raw instrument output into research-ready, schema-validated data for model training.
Contribute to an engineering culture that values maintainability, testing, robust system design, and deep collaboration, but allows flexibility for rapid prototyping and responsiveness to changing landscapes.

What Makes You a Great Fit

Nobody checks every box - if you’re not sure if you’re qualified, we still encourage you to apply.

You have strong programming experience in Python, and value code quality, reliability, and readability as much as performance.
You have experience working on cloud compute platforms, containers and Linux environments.
You think in systems and own them end-to-end - from device output to APIs - and embrace long-term engineering rather than one-off scripts.
Hands-on experience building data pipelines for physical systems: autonomous vehicles, robotics, clinical/lab instruments, industrial control systems, or similar environments.
Experience with real-time data streaming: Kafka or equivalent message brokers, MQTT or similar protocols, and the challenges that come with high-frequency, low-latency data capture.

Great to Also Have

Familiarity with live video or high-bandwidth media streams as data engineering problems is a strong plus.
Experience in automated lab, clinical lab, or life sciences environments - understanding of instrument APIs, lab protocols, and the data quality expectations of scientific workflows.
Comfort with time-series data from sensors and control systems, including sampling rates, data loss handling, and operational (driving live systems) vs analytical (modelling) use cases of the same data.
Understanding of closed-loop control systems and the data infrastructure needed to support real-time decision making.

Why This Role

You'll be one of a nimble team building infrastructure that directly enables autonomous science at scale. The hardware is real, the data volumes are high, the use cases span from live lab control to training foundation models — and the decisions you make about schemas, protocols, and pipeline architecture now will shape how the platform grows across new sites and hardware categories in 2027 and beyond.

Benefits

We offer the following salary and benefits:

Enhanced holiday pay

Pension

Life Assurance

Income Protection

Private Medical Insurance

Hospital Cash Plan

Therapy Services

Perk Box

Electric Car Scheme

Why work for EIT:

At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact!

Forward Deployed Data Engineer

Other open roles at Ellison Institute of Technology(6)