This job was posted more than 40 days ago and might be expired.

AI Evaluation Engineer (Knowledge & Research)

Gramian Consulting Group·See all Gramian Consulting Group jobs on Jobr

Posted about 1 month ago

RemoteBrazil

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for an AI Evaluation Engineer with a strong research background to design and evaluate complex, multi-agent tasks used to benchmark next-generation AI systems.

In this role, you will work at the intersection of research, data structuring, and AI evaluation, building high-quality tasks that require deep document understanding, structured reasoning, and multi-step synthesis. You will create datasets and evaluation frameworks that test whether AI agents can truly read, reason, and extract knowledge from large-scale unstructured data.

This is a high-precision, detail-oriented role requiring strong analytical thinking, structured problem decomposition, and the ability to translate research content into measurable evaluation tasks.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 5 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min)

Responsibilities

Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
Design LLM judge prompts that evaluate agent output field-by-field against the oracle
Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis)

Requirements

5+ years of experience in research (academic or industry) in a scientific, technical, or analytical domain
Strong ability to read, analyze, and extract structured information from unstructured documents
Experience designing or working with structured data formats (JSON, schemas, validation)
Proficiency in Python scripting (data processing, validation, or evaluation scripts)
Experience with AI evaluation, coding benchmarks, or structured reasoning tasks (e.g., SWE-bench, Terminal-bench, or similar)
Experience working with Docker (building images, debugging containers)
Strong attention to detail, especially when defining exact, verifiable outputs
Ability to design complex, multi-step problem-solving workflows

Other open roles at Gramian Consulting Group(6)

Senior Software Engineer (Cloud Platform & Java) - REMOTE

France

🏡 Remote

Senior Backend Engineer (Python, Go & Distributed Systems)

Guatemala

🏡 Remote

Growth Analyst (Product & Funnel Analytics) - REMOTE

Panama

🏡 Remote

AI Training in Turkish

Turkey

🏡 Remote

AI Training in Chinese

Ghana

🏡 Remote

See all Gramian Consulting Group jobs on Jobr

Job details

Workplace

Remote

Location

Brazil

Gramian Consulting Group

View company page

Gramian Consulting is your partner for accessing the engineering capabilities you need—delivered in the model that fits your business, from staff augmentation and talent recruiting to Build-Operate-Transfer (BOT). We combine the perspective of a software engineer, the rigor of a technical recruiter, and the vision of a business builder, so you get experts who understand your challenges and deliver results the right way. This blend is our signature advantage in providing top-quality services, fast and reliably.

Website LinkedIn

Key team members

Emmanuel Yawson

Pauline Perry

Emad Hassan

Aleksandra Šarac

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages

AI-personalised cover letters

Human review before every submit

Application tracking & follow-ups