CrawlJobs Logo

Sre Manager

https://www.randstad.com Logo

Randstad

Location Icon

Location:
Japan , Tokyo

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

8000000.00 - 13000000.00 JPY / Year

Job Description:

Big IT budget and implements the latest technology. Remote and flex work available.

Requirements:

  • 7+ years of experience in SRE or cloud-related roles
  • Proficiency in scripting and automation languages (e.g., Python, Bash, PowerShell)
  • Fluent~Native Japanese (JLPT N2+), English (TOEIC 700+)
What we offer:
  • health insurance
  • welfare pension insurance
  • unemployment insurance
  • weekends off
  • national holidays off

Additional Information:

Job Posted:
May 09, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Sre Manager

Manager, Site Reliability Engineering and Incident Management

Planet DDS is seeking a Manager, Site Reliability Engineering and Incident Manag...
Location
Location
United States , Atlanta
Salary
Salary:
118000.00 - 160000.00 USD / Year
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in SRE, DevOps, or Infrastructure roles
  • 3+ years in Incident Management leadership
  • Deep understanding of reliability, scalability, and performance optimization
  • Multi-cloud expertise in AWS, Azure, or GCP
  • Understanding of DNS, load balancing, firewalls, and compliance frameworks
  • Knowledge of fundamental cloud security (e.g., identity and access management, firewalls)
  • Deep understanding of logging and monitoring and security best practices
  • Strong collaboration and communication skills
  • Bachelor’s Degree in a relevant major or equivalent years of experience is a plus
Job Responsibility
Job Responsibility
  • Lead and mentor a team of SREs and Incident Managers
  • Foster a culture of reliability, accountability, and continuous improvement
  • Collaborate with engineering teams to design resilient platform architectures
  • Oversee the incident response process for outages and service disruptions
  • Ensure timely detection, escalation, and resolution of incidents
  • Drive post-incident reviews (PIRs) and root cause analysis
  • Implement improvements based on lessons learned to prevent recurrence
  • Mature and enforce best practices for incident response and runbooks
  • Automate operational tasks to reduce toil and improve efficiency
  • Maintain observability tools (monitoring, alerting, logging)
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

The Wikimedia Foundation is looking for an Engineering Manager to join our SRE t...
Location
Location
United States of America
Salary
Salary:
132439.00 - 208378.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Prior experience managing teams
  • Prior hands-on experience with software or reliability engineering (within the last 3 years preferred)
  • Ability to analyze complex systems, troubleshoot issues, and devise effective solutions under pressure
  • Proficiency in project management methodologies to effectively plan, execute, and track new and existing initiatives
  • Strong understanding of cloud computing, networking, Linux systems administration, containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible) to be able to provide technical support to the team
  • Aptitude for automation and streamlining of tasks
  • Communicate effectively in both spoken and written English
  • Ability to work independently, as an effective part of a globally distributed team
  • Ability to travel several times a year for occasional in-person meetings
  • B.S. or M.S. in Computer Science or the equivalent in related work experience
Job Responsibility
Job Responsibility
  • Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization
  • Providing guidance, mentorship, and support to ensure the team's effectiveness and growth
  • Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path
  • Recruiting, hiring, and helping onboard new team members
  • Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members
  • Coordinating and communicating with other members of the Wikimedia product & engineering teams on relevant projects, executing complex projects and contributing to the organizational strategy
  • Continuously developing the roadmap of the team in alignment with other SRE and Product & Technology teams, and helping to draft and execute the team’s annual and quarterly plans
  • Project managing new and existing initiatives
  • Leading the definition, refinement, and execution of the processes through which the team manages and performs work
  • Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Engineering Manager, Platform

We are looking for an engineering manager to help us scale, improve organisation...
Location
Location
Salary
Salary:
Not provided
airalo.com Logo
Airalo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5 years of hands-on technical experience in cloud-native environments, specifically with distributed systems and platform development
  • Minimum 2 years of experience in directly leading and managing platform, DevOps, or SRE teams
  • Expertise in designing, building, refactoring, and operating distributed systems and scalable cloud infrastructure at scale
  • Expertise in event-driven architecture and various Messaging systems (e.g., Kafka, SQS, RabbitMQ, Pub/Sub)
  • Strong knowledge of both relational (SQL) and NoSQL database technologies and their operational considerations in cloud environments
  • Extensive hands-on experience and deep understanding of core AWS services (e.g., EC2, EKS, Lambda, SQS, Security Groups, IAM, Aurora, DynamoDB, S3, RDS, CloudWatch, CloudTrail)
  • Proven expertise with Infrastructure as Code (e.g., Terraform, CloudFormation)
  • Strong experience with containerisation technologies (Docker) and orchestration platforms (Kubernetes), including Helm and related ecosystem tools
  • Extensive experience with modern monitoring, logging, and observability platforms (e.g., Datadog, Prometheus, Grafana, ELK Stack, Jaeger/OpenTelemetry)
  • Strong familiarity with DevSecOps practices and the implementation of automated security tooling throughout the CI/CD pipeline (e.g., SAST, DAST, secret management, vulnerability scanning)
Job Responsibility
Job Responsibility
  • Lead the strategy, architecture, and execution of our core platform technologies
  • Extend and improve engineering best practices across the organisation
  • Maintain and improve a collaborative environment, acting as a key bridge between application development teams and the platform team
  • Motivate and instil a strong sense of ownership in your team for the end-to-end lifecycle, stability, scalability, and performance of our core platform services
  • Mentor and guide the professional and technical development of your team members
  • Ensures that the team delivers high quality products and solutions by following the best practices
  • Build and scale teams that are collaborative, inclusive, and respectful of each other
  • Provide continuous, actionable feedback, address underperformance proactively, and recognise the individual strengths and contributions of your team members
  • Work closely with engineers and collaborate with key stakeholders to define, maintain a prioritised backlog, and establish clear short-term and long-term goals for the platform roadmap
  • Own your team’s deliverables and ensure the continuous delivery of scalable, highly-available, and cost-efficient platform services and infrastructure
What we offer
What we offer
  • Health Insurance
  • work-from-anywhere stipend
  • annual wellness & learning credits
  • annual all-expenses-paid company retreat in a gorgeous destination
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

Koddi is seeking a Platform Engineering Manager to lead a team dedicated to clou...
Location
Location
United States , Fort Worth
Salary
Salary:
Not provided
koddi.com Logo
Koddi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years experience in a DevOps/Platform/SRE role
  • Minimum of 3 years of proven experience managing and motivating engineering teams, in a DevOps/Platform/SRE role
  • Strong experience with the following cloud technologies: Cloud computing, including auto-scaling and load balancing (EC2, VMs, etc.)
  • Cloud storage (S3, BLOB, etc.)
  • Container services (ECS, etc.), Kubernetes services (EKS, etc.)
  • IAM
  • VPCs
  • Demonstrated ability to deliver results under pressure with a quick, precise, style
  • Strong habit of turning ambiguity into actionable plans with clear owners, dates, and milestones
  • Experience providing feedback to leadership on strategic initiatives
Job Responsibility
Job Responsibility
  • Set clear weekly milestones, hold the team accountable, and provide reminders and follow-ups
  • Drive execution by unblocking engineers, reducing bottlenecks, and keeping projects on track
  • Partner with the VP of Infrastructure on goals, priorities, and shifting milestones
  • Communicate progress concisely to engineering leads, stakeholders and Product
  • Own delivery timelines, surface risks early, and ensure corrective actions are taken quickly
  • Mentor and develop engineers through coaching, feedback, and professional growth opportunities
  • Build strong team relationships that foster trust, accountability, and long-term success
Read More
Arrow Right

FX Applications Support Senior Analyst

As an OpsTech Application Support Analyst, the candidate will play a pivotal rol...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years experience in an Application Support role
  • experience installing, configuring or supporting business applications
  • experience with some programming languages and willingness/ability to learn
  • advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • demonstrated analytical skills
  • issue tracking and reporting using tools
  • knowledge/experience of problem Management Tools
  • good all-round technical skills
  • effectively share information with other support team members and with other technology teams
Job Responsibility
Job Responsibility
  • Provide technical and business support for users of Citi Applications
  • maintain application systems
  • manage, maintain and support applications
  • perform start of day checks, continuous monitoring, and regional handover
  • develop and maintain technical support documentation
  • maximize the potential of applications
  • assess risk and impact of production issues and escalate
  • ensure storage and archiving procedures are functioning correctly
  • formulate and define scope and objectives for complex application enhancements
  • prioritize bug fixes and support tooling requirements
What we offer
What we offer
  • Rewarding work in a supportive environment
  • clear opportunities for progression
  • exciting company benefits
  • Fulltime
Read More
Arrow Right
New

Senior Site Reliability Manager

RUCKUS Networks is seeking an experienced Site Reliability Engineering (SRE) Man...
Location
Location
United States , Sunnyvale
Salary
Salary:
135600.00 - 200000.00 USD / Year
commscope.com Logo
CommScope
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years in Site Reliability Engineering (SRE), with 6+ years leading SRE, DevOps, or infrastructure teams
  • Proven experience mentoring engineering managers and developing leadership talent
  • Track record of transforming traditional operations or NOC teams into modern SRE organizations
  • Strong project management skills with Agile/Kanban experience and JIRA proficiency
  • Excellent communication skills, including executive-level presentations
  • Deep SRE expertise: incident management, on-call systems, monitoring, and reliability engineering
  • Infrastructure automation experience with Terraform, Kubernetes, Docker, and CI/CD pipelines
  • Cloud platform proficiency (GCP/AWS), including networking, security, and cost optimization
  • Monitoring and observability experience with Prometheus, Grafana, APM tools, and log aggregation
  • 24/7 operations experience with global team coordination and escalation management
Job Responsibility
Job Responsibility
  • Lead and develop engineering managers and technical operations engineers across India and APAC time zones
  • Build a collaborative team culture that emphasizes knowledge sharing, automation, and operational excellence
  • Mentor engineering managers to strengthen leadership capabilities and technical expertise
  • Set clear performance expectations and provide ongoing coaching for growth
  • Partner cross-functionally with Product, Security, Development, and global operations teams
  • Own 24/7 operational stability for India/APAC, including incident response, escalation, and post-incident reviews
  • Drive comprehensive incident management: alert handling, outage response, and root cause analysis (RCA/CAR)
  • Transform traditional operations into modern SRE practices using SLOs, error budgets, and reliability engineering
  • Implement robust monitoring and alerting with APM tools, dashboards, and automation frameworks
  • Lead technical project delivery with clear timelines, resource planning, and stakeholder communication
What we offer
What we offer
  • medical, dental, and vision plans
  • life and accidental death insurance
  • a 401(k) plan
  • participation in the Company’s Incentive Plan
  • eleven paid holidays in a full calendar year
  • two weeks of paid vacation (prorated based on start date)
  • other leave options
  • Fulltime
Read More
Arrow Right

Director, Service Reliability Engineering

As Director of SRE, you will lead the team responsible for accelerating and auto...
Location
Location
United States , Bethesda
Salary
Salary:
125600.00 - 203700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate degree in computer science, software engineering, or a related field (or equivalent experience)
  • 10+ years of experience in SRE, devsecops or IT operations
  • At least 5 years’ experience in a previous leadership role within SRE, devsecops or IT Operations
  • At least five years of experience in the following technologies - Presentation Management: HTML, CSS, JS, Backbone, Node JS, Android, iOS, Application Platforms: NGINX, Java, Akana, Play Framework, Tomcat, Docker, Openshift, Application Data: PostgreSQL, Couchbase, Cassandra, Integration Services: Apache Kafka, Apache Spark, Akana, Analytics Platforms: Hadoop, dashDB, Cognos, Tableau, Security: Forgerock, OpenID, OAUTH, Ping Identity, Public Cloud: Azure, Google Cloud, AliCloud, Amazon Web Services, CI/CD: Harness
  • Experience with test automation
  • Working knowledge and proven track record of implementing disaster indifferent architecture
  • Experience with CDN and Akamai tools
  • Linux/Unix system administration experience
  • Proficient in scripting and programming languages (like Python, Go, Bash, Shell)
  • Hands on experience with infrastructure as code (like Terraform), container orchestration (like Kubernetes), and reliability automation
Job Responsibility
Job Responsibility
  • Define and execute Marriott’s SRE vision, aligning with business objectives and technology roadmaps
  • Build, mentor and lead a high-performing SRE team, fostering a culture of collaboration and innovation
  • Establish reliability, observability and automation goals to improve system uptime, performance and scalability
  • Partner with engineering, operations and security teams to drive best practices and continuous improvement
  • Implement reliability-focused engineering practices, including SLAs, SLOs/SLIs and error budgets
  • Design and maintain resilient, scalable and fault-tolerant architectures across cloud and hybrid environments
  • Develop strategies to proactively identify and mitigate risks to system performance and availability
  • Drive root cause analysis (RCA) and post-mortem processes to prevent recurring incidents
  • Champion automation in monitoring, deployment and incident resolution to reduce toil and enhance efficiency
  • Lead and optimize incident response processes, ensuring rapid detection, diagnosis, and resolution of system failures
What we offer
What we offer
  • Bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off (including sick leave where applicable)
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • Fulltime
Read More
Arrow Right