company logo

AI-Driven Big Data Engineer (PhD Required)

Pixalate.com

Office

Singapore, Singapore

Full Time

Ai- Driven Big Data Engineer

Employment Type: Full-Time
Location: Remote, Singapore
Level: Entry to Mid Level (PhD Required)

Bridge Cutting-Edge Ai Research With Petabyte-Scale Data Systems

Pixalate is an online trust and safety platform that protects businesses, consumers and children from deceptive, fraudulent and non-compliant mobile, CTV apps and websites. We're seeking a PhD-level Big Data Engineer to revolutionize how AI transforms massive-scale data operations.

Our impact is real and measurable. Our software has uncovered:

About The Role

Work at the intersection of big data and AI, where you'll develop intelligent, self-healing data systems processing trillions of data points daily. You'll have autonomy to pursue research in distributed ML systems and AI-enhanced data optimization, with your innovations deployed at unprecedented scale within months, not years.

This isn't traditional data engineering - you'll implement agentic AI for autonomous pipeline management, leverage LLMs for data quality assurance, and create ML-optimized architectures that redefine what's possible at petabyte scale.

Key Research Areas & Responsibilities

Ai-Enhanced Data Infrastructure

  • Design intelligent pipelines with autonomous optimization and self-healing capabilities using agentic AI
  • Implement ML-driven anomaly detection for terabyte-scale datasets

Distributed Machine Learning At Scale

  • Build distributed ML pipelines
  • Develop real-time feature stores for billions of transactions
  • Optimize feature engineering with AutoML and neural architecture search

Required Qualifications

Education & Research

  • PhD in Computer Science, Data Science, or Distributed Systems (exceptional Master's with research experience considered)
  • Published research or expertise in distributed computing, ML infrastructure, or stream processing

Technical Expertise

  • Core Languages: Expert SQL (window functions, CTEs), Python (Pandas, Polars, PyArrow), Scala/Java
  • Big Data Stack: Spark 3.5+, Flink, Kafka, Ray, Dask
  • Storage & Orchestration: Delta Lake, Iceberg, Airflow, Dagster, Temporal
  • Cloud Platforms: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)
  • ML Systems: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML Tables
  • Neural Architecture Search: KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + Hydra

Research Skills

  • Track record with 100TB+ datasets
  • Experience with lakehouse architectures, streaming ML, and graph processing at scale
  • Understanding of distributed systems theory and ML algorithm implementation

Preferred Qualifications

  • Experience applying LLMs to data engineering challenges
  • Ability to translate complex AutoML/NAS research into practical production workflows
  • Hands-on project examples of feature engineering automation or NAS experiments
  • Proven success in automating ML pipelines, from raw data to an optimized model architecture
  • Contributions to Apache projects (Spark, Flink, Kafka)
  • Knowledge of privacy-preserving techniques and data mesh architectures

What Makes This Role Unique

You'll work with one of the few truly petabyte-scale production datasets outside of major tech companies, with the freedom to experiment with cutting-edge approaches. Unlike traditional big data roles, you'll apply the latest AI research to fundamental data challenges - from using LLMs to understand data quality issues to implementing agentic systems that autonomously optimize and heal data pipelines.

AI-Driven Big Data Engineer (PhD Required)

Office

Singapore, Singapore

Full Time

October 7, 2025

company logo

Pixalate

pixalateinc