Design, develop, and maintain scalable ETL/ELT pipelines using Databricks.
Build and optimize data workflows using Apache Spark (PySpark/Scala).
Implement data ingestion from multiple sources (APIs, databases, streaming platforms).
Develop and manage data lakes and lakehouse architectures.
Work with cloud platforms such as Amazon Web Services, Microsoft Azure, or Google Cloud Platform.
Optimize performance of queries and large-scale data processing jobs.
Ensure data quality, governance, and security best practices.
Collaborate with data scientists, analysts, and business stakeholders to deliver data solutions.
Implement CI/CD pipelines and version control for data engineering workflows.

Requirements

5+ years of experience in data engineering or big data development.
Strong hands-on experience with Databricks and Apache Spark (PySpark preferred).
Proficiency in Python, SQL, and optionally Scala.
Experience with data modeling, data warehousing, and ETL design.
Hands-on experience with cloud platforms (AWS/Azure/GCP).
Familiarity with tools like Airflow, Kafka, Delta Lake.
Strong understanding of distributed computing and big data architecture.

Data engineer (Databricks)