This job was posted more than 40 days ago and might be expired.
Gramian Consulting Group logo

AI Evaluation Engineer - Planning & Operations

Posted about 2 months ago

RemoteBrazil

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for an AI Evaluation Engineer specialized in planning and operations to design and build benchmark tasks that simulate real-world scenarios such as scheduling, logistics, and resource allocation.

This role focuses on planning, scheduling, and operational optimization problems, where multiple agents must collaborate to solve constraint-rich scenarios involving resources, timelines, and dependencies.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 4 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min) + short interview

Responsibilities

  • Design and build multi-agent benchmark tasks involving:
    • Planning, scheduling, and resource allocation
    • Operational decision-making (logistics, project planning, incident response, capacity planning)
  • Create constraint-rich problem statements with multiple interacting variables
  • Develop verification scripts to evaluate:
    • Feasibility (all constraints satisfied)
    • Completeness (all requirements met)
    • Optimality (efficiency of solutions)
  • Define task decomposition strategies across specialized sub-agents (e.g., resource allocation, constraint resolution, optimization)
  • Model realistic operational systems with dependencies, timelines, and constraints
  • Implement validation logic and evaluation pipelines using Python
  • Work with Docker environments for reproducibility and execution
  • Collaborate with internal teams to improve task quality, coverage, and evaluation rigor

Requirements

  • 5+ years of experience in operations, project management, logistics, or supply chain
  • Strong ability to formalize constraints, dependencies, and scheduling logic
  • Proficiency in Python for building validation and verification scripts
  • Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
  • Strong structured problem-solving and decomposition skills
  • Experience with AI benchmarks or evaluation frameworks (e.g., SWE-bench or similar)
  • Hands-on experience with Docker (Dockerfiles, image builds, debugging)

Nice to Have

  • Background in operations research or optimization-heavy domains
  • Experience with simulation or modeling tools
  • Familiarity with AI planning systems or automated reasoning
  • Project management experience or certifications (PMP, Agile, etc.)
Job details
Workplace
Remote
Location
Brazil
Gramian Consulting Group logo
Gramian Consulting Group
View company page

Gramian Consulting is your partner for accessing the engineering capabilities you need—delivered in the model that fits your business, from staff augmentation and talent recruiting to Build-Operate-Transfer (BOT). We combine the perspective of a software engineer, the rigor of a technical recruiter, and the vision of a business builder, so you get experts who understand your challenges and deliver results the right way. This blend is our signature advantage in providing top-quality services, fast and reliably.

Key team members

Emmanuel Yawson

Emmanuel Yawson

Pauline Perry

Pauline Perry

Emad Hassan

Emad Hassan

Aleksandra Šarac

Aleksandra Šarac

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups