ML Ops Engineer Specialist

Invisible Technologies.com

Office

Brazil

Full Time

Target Profile:

2+ years of experience building and maintaining ML infrastructure or platforms in production environments.
Demonstrated ability to take ML models from experimentation to deployment using MLOps best practices.
Experience collaborating with data scientists, ML engineers, and backend teams on cross-functional projects.

Technical Expertise:

Proficiency in Python and core ML tooling (e.g., MLflow, Kubeflow, Airflow, Docker, Git).
Familiarity with model training frameworks such as PyTorch, ONNX, or scikit-learn.
Experience with CI/CD pipelines tailored to ML systems (e.g., model validation checks, artifact versioning).
Comfortable managing infrastructure via cloud services (GCP, AWS) and container orchestration platforms (e.g., Kubernetes).
Strong debugging and performance tuning skills across data, model, and infrastructure layers.

Bonus (Nice To Haves):

Hands-on experience with Databricks or similar distributed compute environments.
Familiarity with data engineering tools and workflow orchestration (Spark, dbt, Prefect).
Knowledge of monitoring and observability stacks (Prometheus, Grafana, OpenTelemetry) for ML systems.
Exposure to regulatory/compliance-aware ML deployment (audit logs, reproducibility, rollback strategies).

Project Overview & Deliverables:

Project Overview

You’ll design and implement robust infrastructure to enable scalable, reliable, and reproducible machine learning workflows. You’ll streamline the lifecycle of ML models, from experimentation to deployment, ensuring our systems are production-grade and future-proof.

Deliverables:

Build Scalable ML Infrastructure: Architect, deploy, and maintain pipelines and tooling that support versioning, training, testing, and deployment of machine learning models across a variety of environments.
Bridge Research and Production: Work closely with ML researchers, data scientists, and backend engineers into efficient, production-ready services and APIs.
Focus on Automation and Reliability: Implement systems for continuous integration, model monitoring, auto-scaling, and failover, with a strong emphasis on observability and operational excellence.
Optimize Cloud Resources: Optimize compute resources across cloud and hybrid environments (e.g., GCP, AWS, on-prem), reducing latency and cost while maintaining high reliability.
Document Best Practices: Document best practices in MLOps methodologies such as model versioning, reproducibility, metadata tracking, and experiment lineage..

Important:

All candidates must pass an interview as part of the contracting process.

We offer a pay range of $30+ per hour, with the exact rate determined after evaluating your experience, expertise, and geographic location. Final offer amounts may vary from the pay range listed above. As a contractor you’ll supply a secure computer and high‑speed internet; company‑sponsored benefits such as health insurance and PTO do not apply.

We are looking for independent consultants & contractors who run/operate their own business