MLOps Integration Engineer

ECS.com

Office

Arlington, VA, United States

Full Time

ECS is seeking a MLOps Integration Engineer to work in our Arlington, VA office.

Job Summary:

We are seeking an experienced MLOps Integration Engineer to design, deploy, and optimize machine learning pipelines supporting the secure, reliable, and efficient operation of AI models in production. The MLOps Integration Engineer will lead the automation of end-to-end ML workflows—from model deployment and versioning to monitoring, drift detection, and compliance logging. This role focuses on building scalable infrastructure and observability frameworks that ensure models remain performant, traceable, and aligned with mission and business objectives across cloud and on-premises environments.

Responsibilities:

Deploy and manage ML models in production using tools such as MLflow, Kubeflow, or AWS SageMaker, ensuring scalability, low latency, and availability.
Design and maintain dashboards using Grafana, Prometheus, or Kibana to track real-time and historical model performance metrics (e.g., accuracy, latency, throughput).
Build automated pipelines using tools like Evidently AI or Alibi Detect to identify data distribution shifts and initiate retraining or alerting mechanisms.
Implement centralized logging with ELK Stack or OpenTelemetry to capture inference events, system errors, and audit trails for debugging, compliance, and model governance.
Develop CI/CD pipelines using GitHub Actions, Jenkins, or Azure DevOps to automate model builds, testing, deployment, and rollback.
Apply secure-by-design principles to safeguard AI pipelines through encryption, access control, and compliance with frameworks such as GDPR, HIPAA, and NIST AI RMF.
Partner with data scientists, AI engineers, DevOps, and security teams to ensure seamless model integration and lifecycle management.
Optimize model inference performance through techniques such as quantization, pruning, and container orchestration for efficient resource utilization across AWS, Azure, or Google Cloud.
Develop comprehensive documentation for ML pipelines, observability configurations, and monitoring workflows to promote operational transparency and knowledge sharing.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related technical discipline.
Minimum 5+ years of experience in MLOps, DevOps, or software engineering, with emphasis on AI/ML systems.
Proven success deploying and maintaining ML models in production using MLflow, Kubeflow, or cloud AI platforms (AWS SageMaker, Azure ML, or Google Vertex AI).
Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, or Datadog.
Proficiency in Python and SQL; familiarity with JavaScript or Go preferred.
Expertise in containerization and orchestration (Docker, Kubernetes) and CI/CD automation (GitHub Actions, Jenkins).
Working knowledge of time-series databases (InfluxDB, TimescaleDB) and logging frameworks (ELK Stack, OpenTelemetry).
Experience implementing drift detection tools (Evidently AI, Alibi Detect) and visualization libraries (Plotly, Seaborn).
Strong understanding of model performance metrics (precision, recall, F1, AUC) and statistical drift detection techniques (KS test, PSI).
Familiarity with AI security vulnerabilities such as data poisoning and adversarial attacks, with knowledge of mitigation tools like the Adversarial Robustness Toolbox (ART).
Strong problem-solving and debugging ability for complex ML system and pipeline issues.
Excellent collaboration and communication skills across cross-functional technical teams.
High attention to detail to ensure reliability, accuracy, and compliance in system reporting.
Must be a U.S. Citizen and eligible to obtain and maintain a Department of Homeland Security (DHS) EOD clearance (requires a favorable background investigation).