Sr. Site Reliability Engineer (Remote, Mexico)
IO Connect Services
Hybrid
Mexico
Full Time
About IO Connect Services:
IO Connect Services is an AWS Advanced Tier Services Partner and Datadog Partner with a commitment to delivering complex and well-architected technical solutions worldwide. Founded in 2016, our professionals are dedicated to establishing and maintaining trust with our clients and business partners for long-term relationships.
Position Overview:As we expand customer deployments, we’re seeking an experienced SRE. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.
Responsibilities
Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.Design and enhance software architecture to improve scalability, service reliability, capacity, and performance.Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production.Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.Write postmortem reviews and remediation recommendation.Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditionsAuthor and update high-quality documentation of all relevant specifications, systems and proceduresSupport and comply with the company’s Quality Management System policies and procedures.
Required skills and qualificationsBachelor’s degree (or equivalent) in computer science or related disciplineKnowledge of IaC technologies such as Terraform, Ansible, Puppet, Chef.Knowledge of Cluster creation and management through KubernetesKnowledge of Microsoft Azure, AWS, Google Cloud, Azure services, Virtual Machine in Azure, Virtual Network Configuration.Knowledge in design patterns such as: Iaas, Paas, and SaasKnowledge in CI/CDScripting knowledge with PowerShellIPs and Mask knowledgeAbility to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScriptExperience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
What we offer:
Base Salary and permanent contract directly with the companyContinuous training plan with paid certificationsCarreer plan according to your development and knowledgeBenefits above the law: 12 days of Paid Time Off, 30 day Christmas Bonus, Medical Insurance, Life Insurance, Savings Fund, Groceries BonusQuarterly Performance BonusComputer equipment for your workOptional 100% Home Office
IO Connect Services is an AWS Advanced Tier Services Partner and Datadog Partner with a commitment to delivering complex and well-architected technical solutions worldwide. Founded in 2016, our professionals are dedicated to establishing and maintaining trust with our clients and business partners for long-term relationships.
Position Overview:As we expand customer deployments, we’re seeking an experienced SRE. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.
Responsibilities
Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.Design and enhance software architecture to improve scalability, service reliability, capacity, and performance.Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production.Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.Write postmortem reviews and remediation recommendation.Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditionsAuthor and update high-quality documentation of all relevant specifications, systems and proceduresSupport and comply with the company’s Quality Management System policies and procedures.
Required skills and qualificationsBachelor’s degree (or equivalent) in computer science or related disciplineKnowledge of IaC technologies such as Terraform, Ansible, Puppet, Chef.Knowledge of Cluster creation and management through KubernetesKnowledge of Microsoft Azure, AWS, Google Cloud, Azure services, Virtual Machine in Azure, Virtual Network Configuration.Knowledge in design patterns such as: Iaas, Paas, and SaasKnowledge in CI/CDScripting knowledge with PowerShellIPs and Mask knowledgeAbility to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScriptExperience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
What we offer:
Base Salary and permanent contract directly with the companyContinuous training plan with paid certificationsCarreer plan according to your development and knowledgeBenefits above the law: 12 days of Paid Time Off, 30 day Christmas Bonus, Medical Insurance, Life Insurance, Savings Fund, Groceries BonusQuarterly Performance BonusComputer equipment for your workOptional 100% Home Office
Sr. Site Reliability Engineer (Remote, Mexico)
Hybrid
Mexico
Full Time
August 3, 2025