About Code Metal
Code Metal is redefining code translation for mission-critical industries, helping defense partners move more quickly and reliably from algorithm to silicon. Our platform accelerates deployment of DSP, RF, communications, and embedded signal processing algorithms onto heterogeneous compute targets, including GPUs, FPGAs, ASICs, and edge SoCs. We also support automotive, aerospace, and semiconductor partners deploying complex algorithms onto constrained hardware with speed and rigor.
About the Role
We're building next-generation AI systems that help military planners explore, compare, and evaluate operational courses of action. Our work combines frontier language models, simulation, planning, and verification into human-in-the-loop decision-support systems for defense applications. As an Applied AI Research Engineer, you’ll focus on human machine teaming and agentic AI to build systems that allow warfighters, planners, analysts, and decision-makers to explore operational choices with speed, confidence, and control.
This role focuses on designing and building agentic AI systems – not chatbots. You'll develop multi-agent workflows, fine-tune and evaluate models, build retrieval pipelines, experiment with post-training techniques, and integrate AI with simulation and planning software. You'll work closely with AI researchers, software engineers, and defense experts to turn research ideas into production-ready capabilities. The goal is to make complex planning, wargaming, adjudication, and analysis workflows faster, more explainable, and more trustworthy.
Research Areas of Interest
An incomplete list of ongoing and near-term directions:
Human-machine teaming for AI-assisted course-of-action development, comparison, critique, refinement, and operational decision support
Agentic planning systems that integrate language models with simulation, doctrine retrieval, external tools, structured outputs, and deterministic verification
Adapting and optimizing foundation models through fine-tuning, post-training, distillation, reinforcement learning, and rigorous evaluation for planning and decision-support tasks
Multi-agent AI systems for Red/Blue planning, control-cell support, adjudication, branch-and-sequel analysis, and collaborative planning workflows
Building reliable AI systems using self-correction, structured reasoning, constraint-aware generation, verification, and robust tool use
Learning from human expertise through planner feedback, preferences, approvals, synthetic data generation, and human-in-the-loop improvement
Trustworthy AI for high-consequence applications, with an emphasis on explainability, provenance, traceability, auditability, uncertainty estimation, and model behavior analysis
What You’ll Do
Design and build agentic AI systems for planning, decision support, and human-machine teaming
Develop AI pipelines that integrate foundation models, retrieval, simulation, external tools, and deterministic software
Design, run, and analyze experiments to evaluate model and agent performance, reliability, traceability, latency, cost, and user trust
Fine-tune, distill, and evaluate foundation models for domain-specific planning, reasoning, and decision-support tasks
Build datasets, retrieval pipelines, automated benchmarks, and experiment infrastructure to support continuous model improvement and reproducible research
Partner with software engineers to transition research prototypes into scalable AI services
Collaborate with domain experts to translate operational workflows into AI-enabled capabilities while ensuring AI outputs remain explainable, reviewable, and under human control
Why Code Metal?
Mission with impact: Build AI systems that help users reason through high-consequence operational decisions.
AI beyond demos: Work on systems where models are paired with software, verification, simulation, guardrails, and human oversight.
Greenfield research: Explore ambitious ideas in GenAI, RL, agentic workflows, evaluation, and human-machine teaming.
Small-team velocity: Move quickly from research question to prototype to user-facing capability.
Real users: See your work tested by planners, analysts, engineers, and operational stakeholders.
Must-Have Credentials
Bachelor's or Master's degree in Computer Science, Machine Learning, Engineering, Mathematics, Physics, or a related technical field, or equivalent practical experience.
3+ years building AI, machine learning, or applied research systems.
Strong Python engineering skills.
Experience with PyTorch and modern LLM tooling (Transformers, vLLM, Hugging Face, etc.).
Experience building or deploying agentic AI systems, tool-calling workflows, or multi-step reasoning pipelines.
Experience fine-tuning, evaluating, or serving language models.
Experience with retrieval-augmented generation, embeddings, vector search, or knowledge retrieval systems.
Strong understanding of experiment design, benchmarking, and model evaluation.
Ability to move quickly from research prototype to production-quality implementation.
Eligible to obtain a U.S. security clearance.
Benefits
Pay depends on experience, but we strive to be at the upper end of the salary range
Health care plan with 100% premium coverage, including medical, dental, and vision
401k with 5% matching
Paid Time Off (uncapped vacation, plus sick and public holidays)
Flexible hybrid or remote work arrangement
Relocation assistance for qualifying employees
Wage Transparency - The salary range for this role is not a guarantee of compensation or salary, as the final offer amount may vary based on factors including, but not limited to, individual proficiency, skills, experience, and location.
We are an equal opportunity employer. US Citizenship may be required for certain project assignments involving security clearance.
Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.