Data Storage & Ingestion Consultant
Eon Systems.com
Office
San Francisco
Full Time
About Us
At Eon, we are at the forefront of large-scale neuroscientific data collection. Our mission is to enable the safe and scalable development of brain emulation technology to empower humanity over the next decade, beginning with the creation of a fully emulated digital twin of a mouse.
Role
We’re a San Francisco team collecting very large microscopy datasets and we need an expert to design and implement our end-to-end data pipeline, from high-rate ingest to multi-petabyte storage and downstream processing. You’ll own the strategy (on-prem vs. S3 or hybrid), the bill of materials, and the deployment, and you’ll be on the floor wiring, racking, tuning, and validating performance.
Our current instruments generate data at ~1+ GB/s sustained (higher during bursts) and the program will accumulate multiple petabyes total over time. You’ll help us choose and implement the right architecture considering reliability and cost controls.
Outcomes (What Success Looks Like)
- Within 2 weeks: Implement an immediate data-handling strategy that reliably ingests our initial data streams.
- Within 2 weeks: Deliver a documented medium-term data architecture covering storage, networking, ingest, and durability.
- Within 1 month: Operationalize the medium-term pipeline in production (ingest → buffer → long-term store → compute access).
- Ongoing: Maintain ≥95% uptime for the end-to-end data-handling pipeline after setup.
Responsibilities
- Architect ingest & storage: Choose and implement an on-prem hardware and data pipeline design or a cloud/S3 alternative with explicit cost and performance tradeoffs at multi-petabyte scale.
- Set up a sustained-write ingest path ≥1 GB/s with adequate burst headroom (camera/frame-to-disk), including networking considerations, cooling, and throttling safeguards.
- Optimize footprint & cost: Incorporate on-the-fly compression/downsampling options and quantify CPU budget vs. write-speed tradeoffs; document when/where to compress to control $/PB.
- Integrate with acquisition workflows ensuring image data and metadata are compatible with downstream stitching/flat-field correction pipelines.
- Enable downstream compute: Expose the data to segmentation/analysis stacks (local GPU nodes or cloud).
Skills
- 5+ years designing and deploying high-throughput storage or HPC pipelines (≥1 GB/s sustained ingest) in production.
- Deep hands-on with: NVMe RAID/striping, ZFS/MDRAID/erasure coding, PCIe topology, NUMA pinning, Linux performance tuning, and NIC offload features.
- Proven delivery of multi-GB/s ingest systems and petabyte-scale storage in production (life-sciences, vision, HPC, or media).
- Experience building tiered storage systems (NVMe → HDD/object) and validating real-world throughput under sustained load.
- Practical S3/object-storage know-how (AWS S3 and/or on-prem S3-compatible systems) with lifecycle, versioning, and cost controls.
- Data integrity & reliability: snapshots, scrubs, replication, erasure coding, and backup/DR for PB-scale systems.
- Networking: ****25/40/100 GbE (SFP+/SFP28), RDMA/ RoCE/iWARP familiarity; switch config and path tuning.
- Ability to spec and rack hardware: selecting chassis/backplanes, RAID/HBA cards, NICs, and cooling strategies to prevent NVMe throttling under sustained writes.
- Experience building tiered storage systems (NVMe → HDD/object) and validating real-world throughput under sustained load.
- Practical S3/object-storage know-how (AWS S3 and/or on-prem S3-compatible systems) with lifecycle, versioning, and cost controls.
- Data integrity & reliability: snapshots, scrubs, replication, erasure coding, and backup/DR for PB-scale systems.
- Networking: ****25/40/100 GbE (SFP+/SFP28), RDMA/ RoCE/iWARP familiarity; switch config and path tuning.
Ideal Skills:
- Experience with microscopy or scientific imaging ingest at frame-to-disk speeds, including Micro-Manager-based pipelines and raw-to-containerized format conversions.
- Experience with life science imaging data a plus.
Engagement Details
- Contract (1099 or corp-to-corp); contract-to-hire if there’s a mutual fit.
- On-site requirement: You must be physically present in San Francisco during build-out and initial operations; local field work (e.g., UCSF) as needed.