This job was posted more than 40 days ago and might be expired.

AI Evaluation Engineer - Software Engineering Domain

Gramian Consulting Group·See all Gramian Consulting Group jobs on Jobr

Posted about 1 month ago

RemoteBrazil

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role Overview

We are looking for highly analytical engineers and technical domain experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based and infrastructure-heavy workflows. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios.

The ideal candidate has strong experience working with production systems, debugging, automation, or large-scale engineering workflows, and can design realistic technical challenges that simulate real-world engineering environments.

This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering.

CONTRACT: Contractor assignment (5 weeks)

COMMITMENT: Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap

LOCATION: Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam

PROCESS: One technical assessment/interview (~45 min)

Responsibilities:

Design realistic terminal-based benchmark tasks for AI evaluation systems
Create technically deep debugging and investigation scenarios
Develop task specifications involving infrastructure, workflows, pipelines, or operational failures
Write clear solution approaches and deterministic evaluation criteria
Identify realistic edge cases, failure modes, and system constraints
Design multi-step reasoning challenges across complex technical environments
Contribute expertise across one or more engineering or operational domains
Review and refine benchmark quality, difficulty, and validation logic
Collaborate with reviewers and researchers on AI evaluation workflows

Requirements

3–10 years of experience in software engineering or related technical domains
Strong debugging, analytical, and systems reasoning skills
Good understanding of system architecture, dependencies, and operational processes
Experience with terminal, CLI, automation, or developer tooling workflows
Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred
Ability to design technically rigorous and realistic engineering scenarios

Other open roles at Gramian Consulting Group(6)

Electronic Engineer (Qucs-S Circuit Simulation)

Nigeria

🏡 Remote

Senior Software Engineer (Node.js, Kafka)

Paris, Île-de-France, France

On-site

Python Game Developer (Panda3D)

Poland

🏡 Remote

Aircraft Maintenance Engineer (EASA B1)

Italy

🏡 Remote

Vulnerability Management Engineer (Application Security)

València, Valencian Community, Spain

On-site

See all Gramian Consulting Group jobs on Jobr

Job details

Workplace

Remote

Location

Brazil

Gramian Consulting Group

View company page

Gramian Consulting is your partner for accessing the engineering capabilities you need—delivered in the model that fits your business, from staff augmentation and talent recruiting to Build-Operate-Transfer (BOT). We combine the perspective of a software engineer, the rigor of a technical recruiter, and the vision of a business builder, so you get experts who understand your challenges and deliver results the right way. This blend is our signature advantage in providing top-quality services, fast and reliably.

Website LinkedIn

Key team members

Emmanuel Yawson

Pauline Perry

Emad Hassan

Aleksandra Šarac

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages

AI-personalised cover letters

Human review before every submit

Application tracking & follow-ups