Gramian Consulting Group logo

AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

Posted about 1 month ago

RemoteBrazil

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for an AI Evaluation Engineer specialized in data analysis to design benchmark tasks that simulate real-world analytical workflows.

You will create scenarios where AI systems must analyze large, messy, multi-source datasets, decompose tasks across multiple agents, and produce clear, verifiable conclusions.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 4 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min)

Responsibilities

  • Design and develop multi-agent benchmark tasks focused on complex data analysis workflows
  • Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data)
  • Build tasks requiring:
    • Cross-referencing across multiple data sources
    • Anomaly detection and contradiction identification
    • Statistical analysis and interpretation
  • Define task decomposition strategies across specialized sub-agents (e.g., financial, technical, operational analysis)
  • Develop verification logic to validate precise analytical outputs (not generic summaries)
  • Implement evaluation pipelines using Python and SQL
  • Create reproducible environments using Docker
  • Analyze task performance and refine for clarity, difficulty, and scoring accuracy

Requirements

  • 5+ years of experience in data analysis or analytics-heavy roles
  • Strong proficiency in Python (pandas, NumPy) and SQL
  • Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
  • Ability to design analytical problems with clear, verifiable answers
  • Solid understanding of statistics (distributions, correlations, outliers)
  • Familiarity with AI benchmarks or evaluation environments (e.g., SWE-bench or similar)
  • Hands-on experience with Docker (Dockerfiles, image builds, debugging)

Nice to Have

  • Experience in financial analysis, operations analytics, or risk analysis
  • Exposure to data pipelines or ETL workflows
  • Experience with data quality validation or anomaly detection systems
  • Familiarity with AI/ML data workflows or evaluation frameworks
Job details
Workplace
Remote
Location
Brazil
Gramian Consulting Group logo
Gramian Consulting Group
View company page

Gramian Consulting is your partner for accessing the engineering capabilities you need—delivered in the model that fits your business, from staff augmentation and talent recruiting to Build-Operate-Transfer (BOT). We combine the perspective of a software engineer, the rigor of a technical recruiter, and the vision of a business builder, so you get experts who understand your challenges and deliver results the right way. This blend is our signature advantage in providing top-quality services, fast and reliably.

Key team members

Emmanuel Yawson

Emmanuel Yawson

Pauline Perry

Pauline Perry

Emad Hassan

Emad Hassan

Aleksandra Šarac

Aleksandra Šarac

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups