SRE Engineer - Data Analytics
DMV IT Service.com
Office
Washington, District of Columbia, United States
Full Time
Job Title: SRE Engineer – Data Analytics
Location: Washington, DC
Employment Type: Contract
About Us
DMV IT Service LLC, founded in 2020, is a trusted IT consulting firm specializing in IT infrastructure optimization, cybersecurity, networking, and staffing solutions. We partner with clients to achieve technology goals through expert guidance, workforce support, and innovative solutions. With a client-focused approach, we also provide online training and job placements, ensuring long-term IT success.
Job Purpose
We are seeking a skilled and motivated SRE Engineer – Data Analytics to enhance the reliability, performance, and scalability of key data and analytics platforms. The ideal candidate will bring strong expertise in automation, CI/CD, cloud infrastructure (AWS/Azure), and observability tools while ensuring service stability and operational excellence across data environments.
Requirements
Key Responsibilities:
Deployment & Automation
- Design, implement, and manage CI/CD pipelines using GitHub Actions, Jenkins, or AWS CodePipeline.
- Automate infrastructure provisioning through Infrastructure-as-Code (IaC) tools like Terraform, AWS CDK, or CloudFormation.
- Develop automation scripts and self-service tools to reduce manual work and enhance operational efficiency.
Performance & Optimization
- Lead cloud infrastructure cost optimization and performance improvement initiatives.
- Configure and monitor auto-scaling, performance thresholds, and resource utilization.
- Conduct resiliency and performance tests to ensure system stability under varying workloads.
Incident Management & Reliability
- Serve as the first responder for production incidents and troubleshoot complex technical issues.
- Utilize ITIL concepts and ITSM tools (e.g., ServiceNow) for managing incidents and change processes.
- Prepare detailed Root Cause Analysis (RCA) reports and create knowledge base documentation.
- Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
Monitoring & Observability
- Configure and manage observability platforms (e.g., Dynatrace, AppDynamics, ELK).
- Implement distributed tracing and create actionable dashboards and alert systems.
- Continuously improve monitoring queries, anomaly detection, and alert tuning.
Data Platform Reliability
- Maintain reliability and performance of Databricks clusters, Informatica workflows, and Power BI integrations.
- Oversee access control, error handling, and workflow orchestration across data systems.
- Ensure consistent data refreshes and secure connections across analytics platforms.
Security & Compliance
- Manage access control and permissions following the principle of least privilege.
- Deploy and maintain digital certificates and TLS/SSL configurations.
- Perform vulnerability remediation and support security incident response.
Required Skills & Experience:
- Bachelor’s degree in Computer Science, Engineering, or related technical discipline.
- 2–4 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
- Hands-on experience with AWS and Azure platforms.
- Proficiency in scripting languages such as Python, Bash, or Go.
- Familiarity with configuration management tools (e.g., Ansible).
- Knowledge of containerization (Docker, Kubernetes/ECS).
- Strong understanding of Linux systems, networking (TCP/IP, DNS, Load Balancing), and databases (SQL, NoSQL, AWS RDS).
- Experience supporting platforms such as Databricks, Informatica, or Power BI is highly preferred.
SRE Engineer - Data Analytics
Office
Washington, District of Columbia, United States
Full Time
October 10, 2025