AI Developer
Salvo Software
Posted 21 days ago
About Salvo Software
Salvo Software is a global firm that provides cost-effective software solutions to guide enterprises and startups through digital transformation. With distributed teams across the US, LATAM, and India, we partner with clients to build high-performance, scalable systems that solve complex technical challenges. Our culture values innovation, ownership, and engineering excellence.
Role Overview
We are seeking a highly skilled AI Developer with a strong backend and machine learning engineering background to design, train, optimize, and deploy LLM models in on-prem and offline environments. This role is deeply technical and hands-on, requiring expertise across Python ML stacks, model optimization, local inference frameworks, RAG (Retrieval-Augmented Generation) architectures, MCP (Model Context Protocol) integrations, and DevOps workflows tailored for offline systems.
You will work closely with our engineering and product teams to build end-to-end LLM pipelines — including data preprocessing, supervised fine-tuning, model quantization, evaluation, RAG pipeline design, and deployment using local or air-gapped infrastructure. If you enjoy working with cutting-edge open-source LLMs, building context-aware AI systems, and designing reliable backend pipelines, this role is for you.
Key Responsibilities
Core LLM Development
- Train and fine-tune LLMs using supervised fine-tuning (SFT).
- Work with open-source models such as LLaMA, Mistral, Qwen, and similar architectures.
- Build LoRA / Q-LoRA pipelines for efficient fine-tuning.
- Implement and optimize data preprocessing workflows, including tokenization and long-context handling.
- Use and extend Hugging Face Transformers & Datasets for training and inference.
- Parse and process structured and semi-structured data, including XML/XSD files.
- Implement document parsing solutions for Office formats (python-docx, OpenXML).
RAG & Context-Aware Systems
- Design and implement end-to-end Retrieval-Augmented Generation (RAG) pipelines for document-grounded question answering and knowledge retrieval.
- Build and maintain vector stores and embedding pipelines using tools such as FAISS, Chroma, Weaviate, or pgvector.
- Optimize retrieval strategies including hybrid search, re-ranking, and chunking approaches tailored for domain-specific corpora.
- Develop and maintain MCP (Model Context Protocol) server integrations to enable LLMs to interact dynamically with tools, APIs, and external data sources.
- Design agentic workflows that leverage MCP to give models structured access to internal systems and context in a controlled, auditable manner.
Offline / On-Prem Model Expertise
- Deploy, run, and maintain models fully offline and in air-gapped environments.
- Perform model optimization and quantization (GGUF, GPTQ, AWQ, bitsandbytes).
- Build and maintain inference systems using frameworks like vLLM, TGI, and Ollama.
- Optimize GPU usage (CUDA, cuDNN, VRAM-aware batching).
- Maintain local CI/CD pipelines for ML models without cloud dependencies.
- Manage local model registries, versioning, and artifacts.
- Ensure RAG and MCP components are fully operational in offline and restricted network environments.
Backend & DevOps
- Build backend services in Python for ML training and inference workflows.
- Work with relational databases (Postgres/MySQL) and vector databases for RAG storage layers.
- Use Docker and Git for reliable development and deployment pipelines.
- Use Azure DevOps for CI/CD, including local runners when applicable.
Requirements
Technical Skills
- Strong experience in Python for backend and ML development.
- Expertise with ML frameworks such as PyTorch or TensorFlow, scikit-learn, and pandas.
- Solid knowledge of Postgres or MySQL for data storage.
- Experience with Docker, Git, and DevOps best practices.
- Hands-on expertise with LLM training, fine-tuning, and optimization.
- Experience with Hugging Face Transformers & Datasets.
- Familiarity with XML/XSD and Office document parsing tools.
- Experience deploying models with vLLM, TGI, or Ollama.
- Understanding of quantization techniques (GGUF/GPTQ/AWQ).
- Experience working with GPU optimization and the CUDA stack.
- Ability to build solutions for offline, on-prem, and air-gapped environments.
- Hands-on experience designing and implementing RAG pipelines, including embedding models, vector stores (FAISS, Chroma, Weaviate, or pgvector), and retrieval optimization strategies.
- Experience building or integrating MCP (Model Context Protocol) servers to connect LLMs with external tools, APIs, and structured data sources.
Nice to Have
- Experience building agentic systems using MCP in production or near-production environments.
- Familiarity with advanced RAG techniques such as HyDE, re-ranking, or multi-hop retrieval.
- Experience managing ML model registries in offline environments.
- Familiarity with AWS for hybrid deployments.
- Experience with secure environments, restricted networks, or enterprise compliance requirements.
Job details
Jobr Assistant extension
Get the extension →