company logo

Senior Site Reliability Engineer

Sana Commerce.com

Office

Alexandria, Alexandria Governorate, Egypt

Full Time

Company Description

At Sana Commerce, we’re committed to creating an inclusive environment because we know our diverse workforce is one of our greatest strengths.

What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company that helps manufacturers, distributors, and wholesalers thrive in B2B commerce complexity.

Our mission? To transform the way businesses buy and sell, so they can grow, build stronger relationships, and make the most of digital commerce. Join us and take ownership of your career in a dynamic, fast-moving environment.

At Sana Commerce, we're looking for a Senior Site Reliability Engineer to strengthen our reliability, observability, and automation capabilities across our Azure and Kubernetes-based platforms. This role blends hands-on operational excellence with engineering practices, ensuring uptime today while building the systems that make tomorrow more resilient.

This SRE position focuses on engineering reliability in everything we do: automating repetitive tasks, improving monitoring signals, running deep root cause analysis, and shaping systems for scalability. You’ll be the engineer others look to during critical incidents, and the one raising the bar on how we prevent them in the first place.

What You'Ll Get:

  • The opportunity to make an impact at a fast-growing SaaS scale-up;
  • A global and customized onboarding program (9,1/10 rated by previous hires);
  • A hybrid working model – 3 days from the office, 2 days from home.

Job Description

What you'll be doing

  • Lead incident response and root cause analysis by driving deep investigations, educating the team, and delivering actionable post-incident insights that prevent recurrence.
  • Manage Kubernetes and Azure environments by owning cluster configurations, platform usage, and ensuring availability, cost efficiency, and security best practices.
  • Develop observability and monitoring strategies with Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, and Azure Monitor to measure performance, user impact, and continuously refine alerts and dashboards.
  • Implement and maintain edge and CDN integrations (Fastly WAF, bot management, CDN) to enhance performance, security, and reliability of customer-facing services.
  • Write and debug automation scripts in PowerShell, Bash, Python, or C#, ensuring logging, rollback, and versioning practices make the platform more resilient and self-healing.
  • Drive Infrastructure-as-Code adoption with Terraform, Bicep, and ARM to standardize environments, automate deployments, and reduce manual interventions.
  • Optimize system and application performance through deep monitoring, dump analysis, and right-sizing of resources to eliminate bottlenecks and maximize efficiency.
  • Collaborate across teams to break down complex problems, contribute to CI/CD and SDLC improvements, and embed reliability into development and release pipelines.
  • Participate in the on-call rotation by taking ownership of incidents, coordinating responses, and ensuring sustainable fixes rather than temporary workarounds.

Qualifications

What You Bring

  • 8+ years of experience in SRE, DevOps, or Cloud Infrastructure, with demonstrated ownership of large-scale systems.
  • Strong hands-on knowledge of Microsoft Azure services and practical experience operating Azure Kubernetes clusters in production.
  • Expertise in Dynatrace, Honeycomb, ElasticSearch, Kibana/Grafana, Azure Monitor (KQL). Able to design actionable monitoring that leads to prevention, not just detection.
  • Proficient in at least one programming/scripting language (PowerShell, Bash, Python, or C#). Strong debugging and logging practices.
  • Hands-on experience with Infrastructure-as-Code (Terraform, Bicep, or ARM) to automate and manage cloud infrastructure.
  • Solid understanding of TCP/IP protocols and troubleshooting network issues in distributed systems.
  • Ability to go beyond surface fixes, identify patterns, and engineer permanent improvements.
  • Strong communicator who can work with cross-functional teams and explain complex issues simply.
  • Microsoft Certified: Azure Administrator Associate
  • CKA: Certified Kubernetes Administrator

Who We Are:

So, what does it mean to be a part of the Sana Commerce team?

At Sana Commerce, our values guide how we work, collaborate, and drive success.

  • Champions of Our League. "We deliver lasting success, balancing quick wins and long-term value." We take pride in our unique product and extensive B2B knowledge and continuously strive to improve. No matter our role, we bring value every day, helping our customers and partners succeed.
  • Supercharge Our Customers. "We’re revolutionizing B2B commerce together, helping our customers to lead and succeed." Our customers are at the heart of everything we do. We go beyond solutions, providing the tools and support they need to grow.
  • Determined to Grow. "We embrace challenges, growing and raising the bar for ourselves and our industry." We take on challenges, seek feedback, and keep learning. Every setback is a chance to improve and move forward.
  • Bold Together. "We dare to be bold because we have each other’s back." We collaborate across teams and time zones, challenge the status quo, and support each other to achieve the best outcomes.

Job descriptions can be tough to interpret. Even if you may not tick all the boxes, please explain your motivation for the role of Data Engineer (AI/ML) in a cover letter, we strongly encourage you to apply if you still feel like you are a great match for this role. Apply now!

Additional Information

#Li-Hybrid

Senior Site Reliability Engineer

Office

Alexandria, Alexandria Governorate, Egypt

Full Time

September 25, 2025

company logo

Sana Commerce

sanacommerce