Internship : Exploring Near Real-Time Data Processing at Scale with Apache Flink and Apache Spark

ELCA Group.com

Office

Pully, Switzerland

Full Time

Description

-This internship offers an in-depth exploration and comparison of two leading stream processing frameworks — Apache Flink and Apache Spark — within the context of near real-time data processing.

The intern will gain hands-on experience designing and implementing near real-time data pipelines using Apache Kafka as the messaging backbone, and processing data streams with both Flink and Spark. The project will include the development of practical use cases involving near real-world data sources, such as event streams from databases or web activity logs.

The final deliverable will consist of performance benchmarks, scalability assessments, and recommendations outlining the strengths and limitations of each framework across different data streaming scenarios.

Objectives

Understand the fundamental concepts of Kafka, Flink, and Spark, including their architecture and use cases.
Implement a pipeline to process streaming data from a single source using Kafka and Flink/Spark, gain insights about the technologies and test optimizations.
Build a second pipeline with a more complex setup:

Database → Debezium → Kafka → Flink/Spark → Operational and Analytical Queries.

Handle multiple tables and implement watermarking to ensure synchronized data processing.

Compare Flink and Spark based on performance, ease of use, and suitability for specific use cases.
Document findings and propose guidelines for choosing between the two frameworks.

Our offer
•    A dynamic work and collaborative environment with a highly motivated multi-cultural and international sites team
•    The chance to make a difference in peoples’ life by building innovative solutions
•    Various internal coding events (Hackathon, Brownbags), see our technical blog
•    Monthly After-Works organized per locations

Skills Requiredcore Skills:

Basics of data engineering and distributed systems.
Knowledge of SQL and database concepts (e.g., relational databases, transactions).
Understanding of streaming concepts and data pipelines (e.g. Kafka, Flink, Spark).

Technical Skills:

Familiarity with Docker and containerized environments.
Knowledge of Kafka and concepts like producers, consumers, topics, and partitions.
Basic programming skills in Python, Java, or Scala.
Understanding of event-driven architectures and CDC tools (Debezium is a plus).
Exposure to cloud platforms (e.g., AWS, Azure, or GCP) is an advantage.

Other Skills:

Analytical thinking and problem-solving skills.
Ability to learn new tools and technologies quickly.
Interest in benchmarking and performance evaluation.

This internship starts in February 2026.

Applications must include your most recent academic transcripts (grades); applications without transcripts will not be considered.