Senior Site Reliability Engineering Manager Job at Microsoft Corporation (Redmond)

Senior Manager, Reliability Engineering & Work Management

Join Amgen’s Mission of Serving Patients. At Amgen, if you feel like you’re part...

Location

United States , Holly Springs

Salary:

148715.15 - 201202.85 USD / Year

Amgen

Expiration Date

Until further notice

Requirements

High school diploma / GED and 12 years of engineering, maintenance, reliability, or CMMS experience OR Associate’s degree and 10 years of engineering, maintenance, reliability, or CMMS experience OR Bachelor’s degree and 8 years of engineering, maintenance, reliability, or CMMS experience OR Master’s degree and 6 years of engineering, maintenance, reliability, or CMMS experience OR Doctorate degree and 2 years of engineering, maintenance, reliability, or CMMS experience
minimum of 2 years experience directly managing people and/or leadership experience leading teams, projects, programs, or directing the allocation or resources

Job Responsibility

Lead and develop a multi-functional engineering operations organization responsible for work order administration, CMMS governance, maintenance planning and scheduling, reliability engineering, central inventory management, and KPI reporting
Lead the Central Inventory (equipment spare parts) program working closely with the Internal Service Provider as it relates to the overall MRO process supporting the site
Lead the site reliability program, aligning on the network approach for the Amgen North Carolina reliability roadmap, implementing initiatives and own engineering KPIs and business reporting on program and site equipment health
Establish long-range maintenance and asset reliability strategies aligned with site operational goals, regulatory compliance requirements, and enterprise engineering standards
Identify opportunities for continuous improvement, recommend solutions, and manage implementation of initiatives to improve planning and scheduling accuracy and adherence and overall asset management program
Collaborate with key customers and support groups regarding preventative maintenance activities and on time completion as well as manage aging work order backlog
Collaborate with corporate network groups to improve planning and scheduling performance in addition to improving programs within CMMS, Central Inventory, and Reliability to meet business needs
Provide direct supervision and oversight of work order safety programs and planning procedures
Experience working in a regulated environment (e.g. cGMP, OSHA, EPA, etc.), experience interacting with regulatory agencies and inspectors, and familiarity with GMP quality systems/processes such as change control, non-conformances, corrective and preventative actions, and qualifications/validation
Strong leadership, technical writing, and communication/presentation skills

What we offer

A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts
A discretionary annual bonus program
Stock-based long-term incentives
Award-winning time-off plans
Flexible work models where possible

Fulltime

Loan IQ Product Development and Site Reliability Engineering Manager

The Applications Development Group Manager is a senior management level position...

Location

Singapore , Singapore

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

10+ years of relevant experience
5 years of experience in Loan IQ product
10+ years of experience with Java related technologies including spring boot and angular
5+ years experience of platform management
Experience managing global technology teams
Working knowledge of industry practices and standards
Consistently demonstrates clear and concise written and verbal communication
Bachelor's degree/University degree or equivalent experience

Job Responsibility

Manage multiple teams of professionals to accomplish established goals and conduct personnel duties for team (e.g. performance evaluations, hiring and disciplinary actions)
Provide strategic influence and exercise control over resources, budget management and planning while monitoring end results
Utilize in-depth knowledge of concepts and procedures within own area and basic knowledge of other areas to resolve issues
Ensure essential procedures are followed and contribute to defining standards
Integrate in-depth knowledge of applications development with overall technology function to achieve established goals
Provide evaluative judgement based on analysis of facts in complicated, unique, and dynamic situations including drawing from internal and external sources
Influence and negotiate with senior leaders across functions, as well as communicate with external parties as necessary
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency, as well as effectively supervise the activity of others and create accountability with those who fail to maintain these standards

Fulltime

Principal Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements required for this role
This role requires access to Microsoft Government cloud environments, including GCC Moderate (GCCM), GCC High (GCCH), and Department of Defense (DoD) environments
For access to GCCH and DoD environments, this role requires the ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
For access to GCCM environments, this role requires the ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
For manager-level roles, a Tier 5 (T5) background investigation is preferred
Candidates may be considered without currently holding these background investigations, provided they are eligible for and able to successfully obtain them

Job Responsibility

Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
Own the operational health and reliability posture of Substrate services running in regulated environments
Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
Lead effective incident management and post-incident reviews
Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
Drive engineering led operational excellence at scale
Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
Influence technical and operational strategy beyond your immediate team
Represent your team’s work clearly to leadership and partners

Fulltime

Principal Site Reliability Engineering Manager

The Principal SRE Manager leads the team responsible for durable, high quality h...

Location

Australia , Perth

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
equivalent experience
Proven experience leading teams through high severity production incidents in large, distributed systems
Demonstrated people leadership experience managing senior engineers or technical incident leaders
Strong understanding of incident management, reliability engineering, and live site operations at scale
Ability to drive clarity, accountability, and results in ambiguous, time critical situations
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check

Job Responsibility

Own execution quality for Substrate high severity incidents, ensuring clear command, decisive leadership, and forward momentum during high impact events
Act as the senior incident leader or sponsor for long running, high stakes, or cross service incidents, ensuring alignment on impact, risk, and recovery priorities
Partner closely with Incident Managers, Subject Matter Experts, and service leaders to ensure effective diagnosis, escalation, and mitigation when ownership is unclear or action is blocked
Ensure high quality post incident reviews and drive accountability for repair items that reduce recurrence and systemic risk
Ensure consistent application of severity and priority models, outage declaration criteria, and executive escalation paths
Lead, coach, and develop a team of Site Reliability Engineers serving as incident responders
Build a culture of calm execution, accountability, psychological safety, and continuous learning during and after incidents
Hire and grow senior talent capable of operating as trusted leaders in high pressure, executive visible situations
Serve as a trusted advisor to engineering leaders and executives on live site risk, readiness, and incident response maturity
Communicate clearly and credibly with senior leadership during customer impacting events

Fulltime