company logo

L1/L2 Support Engineer (24/7 Application & Monitoring), CICD Platform & Toolkit Section -CICD Factory Department (RS GSD Div)

Rakuten Symphony.com

Office

Rakuten Crimson House, Japan

Full Time

Job Description:

About Organization:
GSD is a leading provider of specialized IT managed services, committed to delivering exceptional 24/7 support and operational excellence for critical business applications. We partner with innovative clients like Rakuten Mobile to ensure their platforms achieve maximum uptime and optimal performance. Join our dynamic team and contribute to a fast-paced, high-impact environment.

About the Role:
As an L1/L2 Support Engineer, you will play a pivotal role in ensuring the continuous availability and optimal performance of Rakuten Mobile's critical applications, including their cutting-edge eSIM service on AWS. You will serve as the first line of defense for incoming incidents (L1 responsibilities) and escalate to provide in-depth troubleshooting and resolution for complex technical issues (L2 responsibilities). This role bridges the gap between initial response and advanced problem-solving, driving incident resolution, contributing to problem management, and enhancing the overall stability of supported platforms. We are looking for a proactive problem-solver with a strong customer service ethic, a foundational to advanced understanding of cloud environments, modern application architectures, and a commitment to continuous improvement.

Key Responsibilities:

1. 24/7 Monitoring & Alert Management (L1 Focus):

・Proactively monitor application, service health, infrastructure (AWS, Kubernetes), and network alerts across all platforms, including the eSIM service.

・Utilize monitoring tools such as Grafana, Loki, Sentry, AWS CloudWatch, and automated email reports to identify anomalies and incidents.

・Analyze platform behavior and proactively identify potential issues to prevent service disruptions.

2. Incident Management & First Response (L1 Focus):

・Act as the primary point of contact for all incoming incidents, whether from monitoring alerts or reported by users/B2B customers.

・Perform initial logging, triaging, prioritization, tracking, and routing of incidents within ticketing systems (Jira, ServiceNow, Telna Ticketing Platform, Zendesk).

・Adhere strictly to defined Service Level Agreements (SLAs) for first response time.

・Perform initial troubleshooting using predefined runbooks and standard operating procedures (SOPs).

・Record events, problems, and their resolutions accurately in logs and ticketing systems.

3. Advanced Incident Resolution (L2 Focus):

・Serve as the primary escalation point for incidents unresolvable by L1, providing advanced troubleshooting and diagnosis.

・Resolve incidents within agreed-upon SLAs and timelines, leveraging runbooks, MOPs, and deep technical knowledge.

・Perform deep-dive troubleshooting for application, data, integration, and underlying infrastructure-related problems.

・Analyze logs (application, system, AWS, Kubernetes, microservices) using tools like Loki, Sentry, and AWS CloudWatch to identify root causes.

・Coordinate with other support or dependency groups (internal or Rakuten Mobile's L3/DevOps) when incidents have linkages.

4. Communication & Escalation:

・Classify incidents based on severity and impact, escalating critical issues promptly to the L2 team (for L1) or the Incident Manager/Rakuten Mobile's L3/DevOps teams (for L2).

・Direct unresolved issues to the appropriate next level of support personnel.

・Provide timely and professional communication to end-users and stakeholders regarding incident status and resolution.

・Gather and pass on feedback or suggestions from customers to the appropriate internal teams.

5. B2B Customer Interaction (for eSIM service - L1/L2 Focus):

・Handle B2B customer complaints and inquiries professionally, primarily through defined playbooks and MOPs.

・Escalate B2B issues beyond L1 scope to the appropriate Rakuten Mobile teams (L1) or resolve advanced B2B issues (L2).

6. Problem Management & Stability (L2 Focus):

・Actively participate in problem management activities, including identifying recurring issues and contributing to Root Cause Analysis (RCA).

・Proactively identify potential problems, analyze technical issues, and propose permanent fixes or solutions.

・Contribute to stability analysis and continuous service improvement initiatives.

・Design and run audit plans to ensure system health and compliance.

・Work on improving performance issues and create logical diagrams to figure out RCAs.

7. Operational Execution & Collaboration (L2 Focus):

・Execute complex operational tasks, including running playbooks, performing switchover/failover activities, and troubleshooting cluster failures (e.g., Kubernetes, DR related problems).

・Collaborate closely with Rakuten Mobile's DevOps and Software Development teams, providing necessary information for pipeline support and understanding rollback strategies.

・Stay updated on relevant changes from third-party providers (e.g., Telna's API changes, service updates, operational guidelines provided by Rakuten Mobile) for troubleshooting.

・Conduct pre-checks and post-checks following service releases, patches, and hotfixes based on Rakuten Mobile-provided MOPs and playbooks.

8. Knowledge Management & Mentorship:

・Contribute to the knowledge base by documenting resolutions and common issues (L1).

・Create and update detailed knowledge base articles, runbooks, and troubleshooting guides for L1 and end-users (L2).

・Provide guidance and mentorship to L1 support engineers for their respective applications (L2).

・Facilitate training/education to L1 staff and receive training from application teams to enhance domain expertise (L2).

・Document all incidents, resolutions, and RCAs accurately for future reference and knowledge sharing.

Required Qualifications:

・Bachelor of Science (BSc) degree in Computer Science, Information Technology, or a related technical field from a nationally recognized/certified university.

・3-5+ years of hands-on experience in a technical support, network operations, or IT service desk role, preferably in a 24/7 environment. (This range covers both L1 and L2 expectations).

・Proven experience with advanced configuration and troubleshooting of complex IT systems.

・Strong hands-on experience with AWS cloud infrastructure and monitoring (e.g., EC2, VPC, S3, CloudWatch, Lambda).

・Experience with containerization technologies, especially Kubernetes, and microservices architectures.

・Proficiency with monitoring tools such as Grafana, Prometheus, ELK Stack, Loki, and Sentry.

・Solid understanding of networking concepts, Linux/Unix administration, and analyzing system logs.

・Experience with database querying and basic operations.

・Proficiency in scripting (Shell/Bash, Python) for automation and troubleshooting.

・Experience with ticketing systems (e.g., Jira, ServiceNow, Zendesk).

・Excellent analytical, problem-solving, and critical-thinking skills.

・Strong communication and interpersonal skills, with the ability to explain complex technical issues clearly.

・Outstanding customer service skills and a dedication to delivering a positive customer experience.

・Ability to work effectively within a team and independently in a fast-paced, constantly changing environment, including shift work and on-call rotations for 24/7 coverage.

・High level of accountability, excellent work ethic, and a proactive attitude.

・Excellent written and oral communication skills in English and Japanese.

Preferred Qualifications:

・Experience with CI/CD pipelines and understanding of DevOps methodologies.

・Itil Foundation Certification.

・AWS certifications (e.g., Solutions Architect Associate, SysOps Administrator Associate).

・Ability to communicate in Japanese is a significant plus, especially for Rakuten accounts.

・Experience with basic fault finding and fault escalation in a network environment.

・Ability to multi-task efficiently and manage competing priorities.

What We Offer:

・Opportunity to work with cutting-edge technologies and critical telecommunications services.

・Challenging and rewarding work with opportunities for deep technical growth.

・Collaborative and supportive team environment.

・Continuous learning and professional development opportunities.

・Competitive salary and benefits package.

Languages:

English (Overall - 4 - Fluent), Japanese (Overall - 3 - Advanced)

L1/L2 Support Engineer (24/7 Application & Monitoring), CICD Platform & Toolkit Section -CICD Factory Department (RS GSD Div)

Office

Rakuten Crimson House, Japan

Full Time

October 9, 2025

company logo

Rakuten Symphony

Rakuten