company logo

Software Engineer, ML Data Platform

Mirage.com

185k - 285k USD/year

Office

Union Square, New York City

Full Time

Mirage is the leading AI short-form video company. We’re building full-stack foundation models and products that redefine video creation, production and editing. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.

We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.

Our Products

Captions

Mirage Studio

Our Technology

AI Research @ Mirage

Mirage Model Announcement

Seeing Voices (White-Paper)

Press Coverage

Techcrunch

Lenny’S Podcast

Forbes Ai 50

Fast Company

Our Investors

We’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.

** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square)

We do not work with third-party recruiting agencies, please do not contact us**

About the Role
We’re looking for a Software Engineer to help build and scale the data systems that power our machine learning products. This role sits at the intersection of data engineering and ML infrastructure: you’ll design large-scale streaming pipelines, build tools that abstract infrastructure complexity for feature developers, and ensure that our feature data is reliable, discoverable, and performant across online and offline environments. If you’re passionate about building foundational systems that enable machine learning at scale — and love solving complex distributed data problems — this is the role for you.

What You’Ll Do

  • Design and scale feature pipelines: Build distributed data processing systems for feature extraction, orchestration, and serving — including real-time streaming, batch ingestion, and CDC workflows.
  • Feature Extraction: Design and implement reliable, reusable feature pipelines for ML models, ensuring features are accurate, scalable, and production-ready through well-designed SDKs and orchestration tools.
  • Build and evolve storage infrastructure: Manage multi-tier data systems (e.g. Bigtable for online features/state, BigQuery for analytics and offline training), including schema evolution, versioning, and compatibility.
  • Own orchestration and reliability: Lead workflow orchestration design (e.g. Pub/Sub, Busboy, Airflow/Temporal), monitoring, and alerting to ensure reliability at 100M+ video scale.
  • Collaborate with ML teams: Partner with ML engineers on feature availability, dataset curation, and streaming pipelines for training and inference.
  • Optimize for performance and cost: Tune GPU utilization, resource allocation, and data processing efficiency to maximize system throughput and minimize cost.
  • Enable analytics and insights: Support downstream analytics and data science workflows by ensuring data accessibility, discoverability, and performance at scale.

Preferred Qualifications

  • 4+ years building distributed data systems, feature platforms, or ML infrastructure at scale.
  • Strong experience with streaming and batch pipelines (e.g. Pub/Sub, Kafka, Dataflow, Beam, Flink, Spark).
  • Experience with Kubernetes, containerized data infrastructure, and workflow orchestration tools (e.g. Airflow, Temporal).
  • Familiarity with ML workflows and feature store design — enough to partner closely with ML teams.
  • Deep knowledge of cloud-native data stores (e.g. Bigtable, BigQuery, DynamoDB, Snowflake) and schema/versioning best practices.
  • Proficiency in Python and experience building developer-facing libraries or SDKs.

Bonus: Experience working with video, audio, or other unstructured media data in a production environment.

Benefits:

  • Comprehensive medical, dental, and vision plans
  • 401K with employer match
  • Commuter Benefits

  • Catered lunch multiple days per week
  • Dinner stipend every night if you're working late and want a bite!
  • Grubhub Subscription

  • Health & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)
  • Multiple team offsites per year with team events every month
  • Generous Pto Policy

Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Please note benefits apply to full time employees only.

Software Engineer, ML Data Platform

Office

Union Square, New York City

Full Time

185k - 285k USD/year

October 17, 2025

trymirage