
About this role
Overview
The Senior Data Engineer is responsible for designing, developing, and managing the on-premises data platform that supports enterprise analytics, reporting, and machine learning. This includes building scalable data pipelines, ensuring data quality, and enabling secure and efficient access to data across the organization. The role requires strong hands-on engineering capability and the ability to work closely with Data Scientists, Architects, and IT infrastructure teams.
Responsibilities
Key Responsibilities
- Design, build, and maintain batch and streaming data pipelines using tools such as Talend, Kafka, and Spark.
- Develop ingestion, transformation, and cleansing workflows for structured and unstructured datasets.
- Implement and manage CDC (Change Data Capture) and real-time streaming data flows.
- Create and manage datasets and tables on the on-premises data lakehouse platform (Iceberg, Hive Metastore).
- Ensure proper schema design, partitioning, performance tuning, and metadata consistency.
- Work with object storage and file systems to optimize data access and reliability.
- Implement data validation, profiling, and quality checks across pipelines.
- Enforce governance and compliance policies, including access control, auditing, and lineage tracking.
- Maintain documentation for data assets, pipelines, and processes.
- Partner with Data Scientists to provide high-quality datasets for model development.
- Support analytics teams with optimized data structures and interfaces.
- Troubleshoot issues in production data workflows, ensuring reliability and SLAs.
- Deploy, schedule, and manage data jobs using on-prem orchestration tools.
- Use Git-based processes for version control, deployment, and code reviews.
- Monitor system performance and optimize workloads to reduce compute and storage costs.
Qualifications
Required Skills & Experience
- 6–10+ years of experience in data engineering or related roles.
- Strong hands-on experience with Spark, Kafka, SQL, and Python/Scala.
- Experience working with on-premises data platforms, storage systems, and Kubernetes-based compute.
- Solid understanding of data modelling, ETL/ELT design, and distributed processing.
- Ability to diagnose performance issues and optimize pipelines for scalability.
- Familiarity with data governance, metadata management, and security principles.
Preferred Qualifications
- Experience with machine learning feature pipelines and model operationalization.
- Knowledge of Iceberg/Delta/Hudi table formats.
- Exposure to monitoring tools such as Prometheus and Grafana.
- Experience working in large-scale, multi-team enterprise environments.