About Mindera

At Mindera, we build high-performing, cross-functional teams that solve complex business challenges through technology. We partner with global clients to deliver innovative, scalable, and cloud-native solutions while fostering a collaborative engineering culture built on autonomy, ownership, and continuous learning.

We are looking for a Data Engineer with strong expertise in Generative AI data pipelines, Vector Databases, AWS, and modern data engineering technologies to help build next-generation AI-powered platforms. This role will focus on developing scalable pipelines for Large Language Model (LLM) applications, Retrieval-Augmented Generation (RAG), and Voice AI solutions.

Role Overview

As a Data Engineer (Generative AI & Vector Systems), you will be responsible for designing and building scalable data pipelines that prepare, transform, and index enterprise data for AI applications. You will work closely with AI Engineers, Data Scientists, Machine Learning Engineers, and Software Engineers to enable high-performance semantic search and retrieval systems using Vector Databases and cloud-native technologies.

This is an exciting opportunity to work on cutting-edge Generative AI solutions involving embeddings, vector search, RAG architectures, and large-scale cloud data processing.

Requirements

Key Responsibilities

AI Data Engineering

Design and develop scalable data ingestion pipelines for AI and ML applications.
Build automated pipelines to clean, transform, chunk, enrich, and load enterprise data into Vector Databases.
Create efficient workflows for generating embeddings from structured and unstructured data.
Optimize data quality for semantic search and retrieval systems.

Vector Database Engineering

Design and manage Vector Database architectures.
Optimize indexing, storage, metadata management, and retrieval performance.
Improve similarity search performance and retrieval latency.
Work with embedding models and vector search optimization techniques.

Data Pipeline Development

Develop production-grade ETL/ELT pipelines.
Build batch and near real-time ingestion pipelines.
Automate workflow orchestration using Apache Airflow.
Monitor pipeline performance and ensure high availability.

Cloud Data Engineering

Develop cloud-native solutions on AWS.
Work with services such as:

Amazon S3
AWS Glue
EMR
Lambda
Athena
IAM
CloudWatch
Optimize compute resources for large-scale AI workloads.

SQL & Data Processing

Write complex SQL queries across distributed data sources.
Use Starburst/Trino to federate and query multiple data platforms.
Design efficient data models for AI workloads.
Perform large-scale joins and transformations.

AI & Machine Learning Support

Work closely with AI engineers to support LLM-based applications.
Build Retrieval-Augmented Generation (RAG) pipelines.
Manage embedding generation and vector indexing.
Support prompt engineering and retrieval optimization initiatives.

Performance & Reliability

Improve data pipeline performance and scalability.
Implement monitoring, logging, and alerting.
Troubleshoot production data issues.
Ensure data integrity and governance.

Required Technical Skills

Programming

Python (Advanced)
SQL (Advanced)

Data Engineering

ETL / ELT
Data Transformation
Data Cleansing
Data Chunking Strategies
Metadata Management
Data Modeling

Vector Databases

Hands-on experience with one or more:

Pinecone
Milvus
Qdrant
Chroma
Weaviate (Good to have)
FAISS (Good to have)

Workflow Orchestration

Apache Airflow

Cloud

Strong experience with AWS services:

Amazon S3
AWS Glue
Amazon EMR
Lambda
Athena
IAM
CloudWatch

Query Engines

Experience with:

Starburst
Trino
Presto (Good to have)

AI / Machine Learning

Working knowledge of:

Large Language Models (LLMs)
Text Embeddings
Semantic Search
Vector Search
Retrieval-Augmented Generation (RAG)
Prompt Engineering (basic understanding)

APIs & Integrations

REST APIs
JSON
Data Connectors

Required Experience

4–8 years of experience in Data Engineering.
Experience building scalable cloud-native data platforms.
Hands-on experience with Vector Databases.
Experience working with enterprise-scale SQL environments.
Strong background in Python-based data engineering.
Experience building production-grade Airflow pipelines.
Familiarity with Generative AI architectures.
Experience supporting Machine Learning pipelines.

Benefits

We offer

Flexible working hours (self-managed)
Annual bonus, subject to company performance
Access to Udemy online training and opportunities to learn and grow within the role

At Mindera we use technology to build products we are proud of, with people we love.

Software Engineering Applications, including Web and Mobile, are at the core of what we do at Mindera.

We partner with our clients, to understand their products and deliver high-performance, resilient and scalable software systems that create an impact on their users and businesses across the world.

You get to work with a bunch of great people, and the whole team owns the project together.

Our culture reflects our lean and self-organisation attitude.

We encourage our colleagues to take risks, make decisions, work in a collaborative way and talk to everyone to enhance communication. We are proud of our work and we love to learn all and everything while navigating through an Agile, Lean and collaborative environment.

Data Engineer - Generative AI & Vector Systems