Senior Site Reliability Engineer II

Careem.com

Office

Islamabad, Pakistan; Karachi, Pakistan; Lahore, Pakistan

Full Time

Careem is building the Everything App for the greater Middle East, making it easier than ever to move around, order food and groceries, manage payments, and more. Careem is led by a powerful purpose to simplify and improve the lives of people and build an awesome organisation that inspires. Since 2012, Careem has created earnings for over 2.5 million Captains, simplified the lives of over 70 million customers, and built a platform for the region’s best talent to thrive and for entrepreneurs to scale their businesses. Careem operates in over 70 cities across 10 countries, from Morocco to Pakistan.

Why Join Us?

At Careem, you’ll: - Work with one of the region’s most advanced engineering platforms. - Solve real-world challenges at scale impacting millions of users. - Learn and grow with a high-performance team. Contribute to AI-integrated infrastructure that supports both traditional services and next-gen intelligent agents.

About The Role

As a SRE Engineer (L10) on the Storage & Infrastructure team, you’ll focus on building, scaling, and automating our core data services. You’ll work with a range of distributed systems including MySQL, Postgres, Kafka, Cassandra, Redis, and OpenSearch, while also supporting emerging workloads like vector databases, embedding stores, and LLM query caches. This role blends operational excellence with an opportunity to support AI-driven use cases like retrieval-augmented generation (RAG), agent memory systems, and AI observability tooling.

What You'Ll Do

Deploy, scale, and maintain cloud-native data systems on AWS.
Automate storage operations using IaC (Terraform, Pulumi, etc.).
Support AI-related infrastructure (e.g. Milvus, Weaviate, or Pinecone).
Collaborate with ML engineers and platform teams to support LLM-powered services.
Optimize and monitor performance across services using Prometheus, Grafana, OpenTelemetry, etc.
Participate in on-call rotations and contribute to post-incident reviews.
Help design secure, scalable environments that are AI-ready and cost-efficient.

You’ll Thrive If You Have-

5–8 years of experience operating distributed systems at scale.
Proficiency with one or more languages (e.g. Go, Python, Bash). - Strong understanding of cloud infrastructure (preferably AWS).
Experience with IaC and CI/CD pipelines. - Familiarity with Kafka, Redis, Cassandra, or

Similar Systems.

Exposure to AI infrastructure (bonus): vector stores, model serving platforms (e.g. Ray, LangChain, LlamaIndex).
Curiosity to learn about integrating infrastructure with AI agents and LLM-based applications.

What we’ll provide you

We offer colleagues the opportunity to drive impact in the region while they learn and grow. As a full time Careem colleague, you will be able to:

Work and learn from great minds by joining a community of inspiring colleagues.
A chance to help shape the future of AI-read
Put your passion to work in a purposeful organisation dedicated to creating impact in a region with a lot of untapped potential.
Explore new opportunities to learn and grow every day.
Work 4 days a week in office & 1 day from home, and remotely from any country in the world for 30 days a year with unlimited vacation days per year. (If you are in an individual contributor role in tech, you will have 2 office days a week and 3 to work from home.)
Access to healthcare benefits and fitness reimbursements for health activities including gym, health club, and training classes.