DEUNA logo

AI Engineering Lead

DEUNA

Posted about 10 hours ago

About the Role

Athia is DEUNA's AI-powered payment intelligence platform — moving from early ML experimentation to the critical infrastructure behind billions of dollars in annual transaction volume. We are looking for a hands-on Engineering Lead who can own the full technical stack: from model development and data pipelines to production payment orchestration, cloud/on-prem deployments, and real-time observability.

This is not a coordination role. You will build, ship, and own. You will be the technical authority that bridges AI/ML systems with our core payments stack, leading both the platform engineering and the modeling lifecycle end-to-end.

###

Core Responsibilities

1 · AI/ML Model Ownership

  • Design, train, and fine-tune ML models for payment optimization use cases — including authorization rate improvement, dynamic routing, cost minimization, and fraud signal detection.

  • Select and apply the right frameworks (PyTorch, TensorFlow, scikit-learn) per model type and latency budget.

  • Own the model lifecycle: experimentation → offline evaluation → shadow deployment → A/B testing → production promotion.

  • Monitor and remediate model drift, data distribution shifts, and performance degradation proactively.

  • Define evaluation metrics that map directly to business KPIs (approval rate lift, GMV impact, provider cost).

  • 2 · Data Pipelines & Feature Engineering

    • Architect and build optimized data pipelines to collect, clean, and preprocess high-volume transaction data for model training and inference.

    • Design feature stores and real-time feature serving layers that keep inference latency within payments SLA requirements (<100 ms).

    • Establish data quality standards, schema validation, and lineage tracking across the ML data stack.

    • Partner with the Data Engineering team to ensure training data reflects the full distribution of providers, regions, and merchant types in our network.

    • 3 · Production Deployment & Payments Stack Integration

      • Integrate ML model outputs into DEUNA's live payment routing and orchestration layer with zero tolerance for latency regressions or silent errors.

      • Develop and own the inference service layer in Go and Python, ensuring thread-safe, performant, and fault-tolerant operation under peak transaction load.

      • Lead the design of hybrid deployment architectures: cloud-native (AWS/GCP) and on-premise client environments, including secure bi-directional data synchronization.

      • Build and maintain RESTful and gRPC APIs that expose Athia capabilities to the broader DEUNA platform and external partners.

      • 4 · Observability, Monitoring & Incident Response

        • Own the full observability stack for Athia: real-time dashboards, alerting thresholds, anomaly detection, and post-incident reviews.

        • Implement model-specific monitoring (prediction distributions, confidence scores, provider error rates) alongside standard infrastructure metrics.

        • Create a fast feedback loop with the Operations team to detect and remediate routing degradation or GMV impact within SLA.

        • Define on-call runbooks and escalation paths that are clear, tested, and kept up to date.

        • 5 · Scalability, Resiliency & Engineering Leadership

          • Provide architectural guidance to scale Athia to handle 10M+ monthly transactions across concurrent global partner launches.

          • Lead and mentor engineers through architecture reviews, code reviews, technical planning, and day-to-day execution.

          • Drive engineering best practices: testing strategy (unit, integration, shadow), CI/CD pipelines, documentation standards, and security compliance.

          • Translate business and product goals into concrete technical roadmaps with realistic timelines and clear dependency mapping.

          • Requirements

            Backend & Infrastructure

            • Go (Golang) — production-grade services

            • Python — ML pipelines, model serving, tooling

            • RESTful APIs and gRPC

            • Distributed systems & event-driven arch

            • CI/CD, Docker, Kubernetes

            • Cloud platforms (AWS or GCP)

            • Hybrid / on-prem deployment patterns

            AI / ML Stack

            • PyTorch or TensorFlow — training & fine-tuning

            • scikit-learn, XGBoost, or tabular ML

            • MLflow, Weights & Biases, or equivalent

            • Feature engineering & feature stores

            • Model monitoring & drift detection

            • A/B testing and shadow deployment

            • Low-latency inference architectures

            Frontend & Full-Stack

            • React and Next.js

            • TypeScript

            • Component design systems

            • API integration patterns

            Observability & Data

            • Prometheus, Grafana, or Datadog

            • Structured logging & distributed tracing

            • SQL and analytical query patterns

            • Data pipeline tooling (Airflow, dbt, etc.)

            Experience

            • 6+ years in software engineering with strong backend foundations.

            • 2+ years in a Tech Lead or Staff Engineer role owning a production platform end-to-end.

Want to see the full job description?

Sign in to view the complete details and apply to this position.

Job details

Workplace

Office

Location

San Francisco

Experience

SE

Similar

Jobr Assistant extension

Get the extension →