Senior AI Engineer

Milestone Technologies, Inc..com

Hybrid

Remote, IN

Full Time

Job Overview

We’re hiring an Senior AI Engineer to build production-grade components for an AI-first, data-centric platform. You will implement agentic capabilities (intent, planner, router/composer), integrate knowledge-graph reasoning alongside a strong RAG baseline, and instrument robust evaluation and observability. The ideal candidate writes clean, reliable code, understands LLM systems and data retrieval trade-offs, and can optimize for latency, quality, and cost.

Key Responsibilities

Agent Implementation: Build and harden Intent, Planner, and Router/Composer agents with typed JSON I/O, retries/timeouts, and idempotency; emit call-graph traces and correlation IDs.
Knowledge-Graph Reasoning: Generate correct graph queries (SPARQL/Gremlin/PGQL) from planner outputs; perform subgraph extraction; encode rationale and references in responses.
RAG Baseline & Retrieval: Implement document prep, chunking/embeddings, hybrid retrieval and (where available) reranking; maintain a high-quality baseline path for side-by-side comparisons.
Prompt/Config Tuning: Version and tune prompts, routing policies (small→large model escalation), temperature/top-p settings, and caching; document routing outcomes and cost/latency budgets.
Evaluation Hooks: Integrate test sets and scoring (faithfulness/correctness, precision/recall, multi-hop coverage, latency); enable automated re-evaluation on any change (model/agent/prompt/data).
Observability & Cost Controls: Instrument traces/metrics/logs (token usage, latency P50/P95, error codes); surface cost-per-answer dashboards; implement backpressure and graceful degradation.
Security & Guardrails: Enforce policy-as-code and entitlement checks (role/row/column), PII/PHI handling, content moderation, and HITL approval prompts for state-changing actions.
Quality & CI/CD: Write unit/integration/contract tests; participate in PR reviews; ship via CI/CD with feature flags and environment promotion; maintain API/connector schemas and docs.

Required Skills

Applied LLM Engineering: 1-2+ years building production services; hands-on with LLM tool/function-calling, agent frameworks, and prompt/version management.
Knowledge & Retrieval: Practical experience with Knowledge Graphs (RDF/SPARQL or property graph/Gremlin) and RAG pipelines (chunking, embeddings, retrieval/reranking).
Data/Model Ecosystem: One or more vector DBs (pgvector, Pinecone, Weaviate, Milvus) and search (OpenSearch/Elasticsearch); familiarity with major model platforms (Azure OpenAI, Vertex, Anthropic, open-weights).
Backend Skills: Proficiency in Python and/or TypeScript/Node.js; strong REST/gRPC API design, JSON Schema/OpenAPI, retries/backoff/idempotency, and error taxonomies.
Observability & Reliability: OpenTelemetry (traces/metrics/logs), performance profiling, resiliency patterns (circuit breakers, bulkheads, DLQ/queues).
Security by Design: OIDC/SSO, secrets management, least-privilege access, audit logging, and secure coding for AI/data services.
CI/CD & Testing: Git-based workflows, automated pipelines, unit/integration/contract tests, and environment promotion practices.

Good to Have Skills

Ontology & Data Quality: SHACL/OWL basics, ontology stewardship, lineage/provenance capture, and data quality checks for KG/RAG pipelines.
Evaluation Engineering: Judge-model setups, A/B testing, rubric design, and regression dashboards.
Performance & FinOps: Async I/O, caching strategies, connection pooling, and token/runtime budget enforcement.
Runtime & Platform: Containers/Kubernetes, service mesh/API gateways, feature flags, blue/green or canary releases.
UX for Explainability: Collaborating on rationale/explanations (source lists, subgraph summaries) and clear HITL approval prompts.

This role is ideal for a hands-on engineer who enjoys turning advanced reasoning patterns into robust, observable services-balancing quality, safety, and cost at enterprise scale.