CrawlJobs Logo
Amgen Logo Amgen · -

Senior Reliability Engineer

United States, Holly Springs 123098.00 - 149145.00 USD / Year · Job Posted February 20, 2026
Apply Position
Job Link Share

Job Description

Be part of Amgen's newest and most advanced drug substance manufacturing plant. When completed, the Amgen FleXBatch facility will combine the latest in disposable technologies with traditional stainless-steel equipment to allow for maximum flexibility in operations. The FleXBatch facility will not only feature the best in-class drug substance manufacturing technologies with embedded industry 4.0 capabilities, but it will also integrate sustainability innovations to reduce carbon and waste, as part of Amgen's plan to be a carbon-neutral company by 2027.

Job Responsibility

  • Lead all aspects of the delivery and continuous improvement of the Engineering Asset Management (AM), Reliability, Sustainability, and Continuous Improvement (CI) programs at Amgen North Carolina (ANC)
  • Serve as the primary system owner and subject matter expert for Engineering AM, CI, alarm management, data analytics, and audit readiness programs, acting as a key liaison between ANC Engineering, Global Asset Management, Sustainability, Quality, and Reliability organizations
  • Deploy and sustain a comprehensive, standardized Reliability Program aligned with corporate Reliability, Sustainability, and Industry 4.0 strategies
  • Establish and monitor standardized metrics across Manufacturing, Packaging, Laboratories, Maintenance, and Utilities to identify performance gaps, regulatory risks, and major reliability offenders, and to drive data-informed, risk-based improvement plans
  • Lead Asset Management, Reliability, CI, Sustainability, Alarm Management, and Audit Readiness programs
  • Establish data-driven reliability frameworks using analytics, dashboards, and KPIs
  • Lead MMP activities including PM optimization, job plans, and spare parts strategy
  • Develop risk-based action plans for reliable and sustainable utilities operations
  • Own alarm management and alarm review programs aligned with regulatory expectations
  • Drive sustainability initiatives (energy, waste, lifecycle optimization)
  • Ensure continuous audit readiness and regulatory inspection support

Requirements

  • High School Diploma / GED and 10 years of Engineering experience
  • Associate’s Degree and 8 years of Engineering experience
  • Bachelor’s Degree and 4 years of Engineering experience
  • Master’s Degree and 2 years of Engineering experience
  • Doctorate Degree

Nice to have

  • Bachelor’s degree and 5 years of experience (or equivalent combination)
  • Experience in Reliability Engineering, data analytics, GMP knowledge, sustainability integration, predictive maintenance, alarm management, and audit readiness experience
  • The ideal individual must be a self-directed excellent teammate ready to mentor and develop engineering staff and embrace a team-based culture that relies on collaboration for effective decision-making
  • Strong leadership, technical writing, and communication/presentation skills

What we offer

competitive and comprehensive Total Rewards Plans that are aligned with local industry standards

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Reliability Engineer

8 matching positions

Senior Reliability Engineer

Are you looking for a career move that will put you at the heart of a global fin...
Location
Location
Poland , Warsaw
Salary
Salary:
241750.00 - 411650.00 PLN / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Software senior developer, scripting in Python/Shell/Bash/Java/Go
  • Experience in CI/CD, in large enterprise tech-stack infra-architecture
  • Hands-on experience deploying and scaling the Model Context Protocol (MCP) in complex, enterprise environments
  • Understanding of SRE/DevOps/CI/CD
  • Strong analytical, algorithmic, and problem-solving skills
  • Excellent teamwork, proactive attitude, strong communication skills, both written and oral
Job Responsibility
Job Responsibility
  • You will work in an agile software development environment, developing quality and scalable software solutions using leading-edge technologies
  • You will work closely with developers, engineers and non-technology employees to help them be more productive with the use of the CI/CD tools
  • You will collaborate with Citi Developer Services engineers to automate manual and repetitive processes, integrate services with AI (by building and maintaining MCP - Model Context Protocol servers), enhance system resiliency, and coordinate service issue investigations by deploying best practices
  • Automate manual activities, repetitive processes, reporting, controls, etc., configure and tune them
  • Build and maintain the foundational MCP servers that allows AI models to securely interact with enterprise systems
  • Continuously improve systems resiliency, reliability, and business cost - through a design and development of software solutions and streamlined processes
  • Mitigate risk by analyzing the root cause of production issues, impacts to business, and required corrective actions.
What we offer
What we offer
  • Employer paid Defined Contribution Pension Plan contribution of 6% of employee’s pensionable earnings (PPE Program)
  • Employer paid Private Medical Care Package for employees and Private Medical Care Packages for certain family members available at preferential rates
  • Employer paid Life Insurance Program for employees and Life Insurance for certain family members available at preferential rates
  • Employee Assistance Program financed by Employer
  • Paid Parental Leave Program (maternity and paternity leave
  • statutory and 2 weeks additional paid paternity leave)
  • Sport Card for employees subsidised via Social Benefits Fund and Sport Cards for certain family members available at preferential rates
  • Additional benefits from Company’s Social Benefit Fund, in particular: Holidays Allowance, support for sport and cultural activities, team building events
  • Additional day off for volunteering
  • Cafeteria/ flex benefit – a company benefits system which enables employees to select and purchase benefits offered by a provider and available for employees on the platform
  • Fulltime
Read More
Arrow Right

Senior Reliability Engineer

The incumbent will be responsible to maintain site-wide equipment and facilities...
Location
Location
Singapore , Tuas
Salary
Salary:
Not provided
pfizer.de Logo
Pfizer
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 6 years for Degree or 15 years for Diploma in Mechanical or Chemical Engineering with relevant experiences in pharmaceuticals, chemicals, or petrochemical industries
  • Experience with Root Cause Failure Analysis, Equipment Criticality Ranking, PM/PdM optimization, and/or Failure Modes and Effects Analysis
  • Strong knowledge and understanding of Current Good Manufacturing Practices (part of GxP)
  • Excellent oral and written communication skills
  • Working knowledge of MS Excel
  • Ability to manage complex issues and foster consensus among teams
  • Familiar with government code of practice, regulations, current Good Manufacturing Practice (cGMP), Good Documentation Practice (GDP) and Data Integrity (DI)
  • Good Mechanical Maintenance Troubleshooting, Repairs and Analysis Skills
  • Good Facilitation and Communication skills
  • Demonstrated problem-solving and relationship management skills
Job Responsibility
Job Responsibility
  • Maintain site-wide equipment and facilities, establishing optimization in initative and Right First Time Strategy / Technique to enhance high equipment and instrument reliability in compliance with cGMP, EHS, Data integrity and regulatory requirements in a cost effective manner
  • Execute the reliability programs of plant Mechanical equipment, instruments and systems by establishing proactive, predictive and preventive maintenance programs, conducting equipment inspections, analyzing data, provide recommendations and follow up monitoring/improvements
  • Accountable for: cGMP and EH&S compliance
  • Mechanical System / Equipment failure and Root Cause Analysis
  • Equipment, instrument and system reliability and performance
  • Team performance
  • Maintenance PM work planning and data tracking
  • Report generation
  • Implementation, Execution, Commissioning/testing of new and obsolete Mechanical systems and equipment
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer - Fleet Reliability

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 345000.00 USD / Year
lambda.ai Logo
Lambda
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • Strong understanding of modern AI infrastructure, from GPU architectures to hardware performance optimization
  • Strong understanding of Linux-based systems in a distributed environment
  • Solid understanding of Python and Go, with experience working with SWE teams to improve internal tooling
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, SumoLogic)
  • Proficiency in automation and configuration management tools (e.g., Ansible, Terraform)
  • Familiarity with cloud platforms (e.g., OCI, AWS, GCP, Azure)
  • Excellent problem-solving and troubleshooting skills
  • Strong communication and collaboration skills
  • Passion for continuous improvement and innovation
Job Responsibility
Job Responsibility
  • Define Fleet Health metrics and indicators to objectively measure and improve system availability
  • Collaborate with the observability team on comprehensive monitoring and alerting systems to proactively predict, detect and respond to issues or anomalies
  • Create runbooks and automated remediations for common failure scenarios
  • Build in automation and auditing to ensure compliance and improve efficiency and productivity
  • Participate in on-call rotations and provide support for incident response and resolution
  • Implement and integrate logging and metrics across platforms such as Datadog, Prometheus, OpenTelemetry, Grafana, SumoLogic, etc
What we offer
What we offer
  • Generous cash & equity compensation
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible paid time off plan
  • Fulltime
Read More
Arrow Right
New

Senior Reliability Engineer - AV Labs

We are looking for a hardware focused Senior Reliability Engineer to focus on se...
Location
Location
United States , Sunnyvale
Salary
Salary:
180000.00 - 200000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant industry experience in software engineering, site reliability, or systems engineering
  • Experience with modern observability platforms (e.g., Prometheus, Grafana, ELK) in edge, IoT, or hardware-integrated environments
  • coding skills in one or more of Go, Python, or C++, with experience building and operating production systems
  • Proficiency in Linux internals and shell scripting for triaging and debugging edge devices or hardware-adjacent systems
  • Ability to debug across services, containers (Docker), and networking stacks
  • Proven track record owning reliability, infrastructure, or platform systems for large-scale production workloads
  • Experience designing and operating observability systems (metrics, logging, alerting, and dashboards)
  • Experience defining and implementing SLIs and SLOs for system availability or data yield
  • Deep understanding of networking protocols (TCP/IP, gRPC, or MQTT) and data handling in bandwidth-constrained environments
  • Experience driving complex technical projects and architectural reviews across multiple teams from design through production
Job Responsibility
Job Responsibility
  • Architect Observability Systems: Design and scale an observability platform capable of ingesting and analyzing real-time health telemetry from thousands of distributed vehicle nodes
  • Build for Edge Constraints: Develop systems that remain performant despite hardware diversity, intermittent connectivity, and rapid fleet scaling
  • Define Criticality Models: Establish alerting strategies that distinguish transient anomalies from systemic issues impacting sensor uptime and data yield
  • Detect Complex Failure Modes: Design detection logic for 'silent' failures, such as sensor degradation, compute saturation, or recording pipeline stalls
  • Scale Through Automation: Design automated detection, triage, and mitigation mechanisms to eliminate manual intervention as the fleet grows
  • Partner on Mitigation: Collaborate with Operations and Engineering to build safe, automated responses to recurring hardware and software failure scenarios
  • Drive Operational Efficiency: Build technical interfaces to help Operations surface issues and Engineering diagnose and deploy mitigations rapidly (TTD/TTM)
  • Lead Technical Strategy: Drive reliability-focused design reviews and translate operational pain points into concrete technical requirements and roadmaps
  • Uncover Proactive Insights: Apply advanced data analytics to identify latent patterns in fleet telemetry, enabling the proactive detection of systemic regressions and hardware degradation before they impact operations
What we offer
What we offer
  • Uber's bonus program
  • equity award & other types of comp
  • 401(k) plan
  • various benefits
  • Fulltime
Read More
Arrow Right

Senior Engineer, Reliability (Mechanical)

Location
Location
Malaysia , Manjung
Salary
Salary:
Not provided
airswift.com Logo
Airswift Sweden
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Engineering Background: Preferably Mechanical, Electrical, or related disciplines
  • Experience: Minimum 8 years (at least 5 years in a relevant reliability or maintenance role)
  • Industry Exposure: Candidates from O&L, heavy machinery, mining, or cement industries are highly preferred
  • Strong background in mechanical conveyors and bulk material handling systems
  • Ability to interpret technical drawings and support simple fabrication needs
  • Familiarity with steel fabrication, welding, and vibration analysis
  • Proficient in root cause analysis (RCA), FMEA, LDA, and other reliability tools
  • Strong data analysis capabilities, especially using CMMS (e.g., SAP)
  • Coordinate across multiple teams: execution, planning, process inspection
  • Lead and develop a multi-skilled team, including mechanical and electrical engineers
Job Responsibility
Job Responsibility
  • 80% strategic focus on equipment lifecycle cost optimization
  • 20% tactical involvement in daily maintenance and reliability operations
  • Fulltime
Read More
Arrow Right

Senior Reliability Engineer - PCBA, Harness & Connectors

We are looking for a Senior Reliability Engineer in charge of developing and exe...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 225000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in relevant reliability engineering areas
  • Bachelor's degree or higher in relevant science and engineering fields
  • Strong knowledge of environmental reliability test principles, models, and methodologies, such as high temperature high humidity, thermal cycle/shock, mechanical vibration/shock
  • Strong knowledge of industry test standards such as AECQ, JEDEC, IPC standards
  • Strong knowledge of electrical circuits, PCBA design and relevant SW tools (e.g. Altium)
  • Strong knowledge of PCBA, harness and connector failure modes, mechanisms, and FA techniques
  • Hands-on experience on field reliability risk analysis and failure prediction methods
  • Hands-on experience with Weibull++, JMP, or other reliability statistical analysis software
  • Hands-on experience on electronic circuit debug and relevant tools, e.g. source meter, oscilloscope
  • Hands-on experience with 3D CAD tool (e.g. CATIA)
Job Responsibility
Job Responsibility
  • Work with cross-functional teams, own hardware reliability requirements and validation strategy
  • Develop and execute accelerated life tests for PCBAs, electronic components, electrical harness and connectors
  • Lead DFMEA efforts with design engineers to assess design risks, impacts, controls, and corrective actions
  • Design reliability test flows and procedures, communicate with internal and external/CM teams to execute tests and report results
  • Work with test engineers to design setup and fixtures used in reliability testing
  • Guide and support PCBA, harness, connector failure analysis, design of experiments (DOEs) and corrective action processes with cross-functional teams
  • Analyze field data, assess field risks, and design tests that correlate to field usage conditions
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer, Wikimedia Enterprise

The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to jo...
Location
Location
United States
Salary
Salary:
116633.00 - 181243.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Automation & Configuration Management: Experience with Infrastructure as Code and automation tools (e.g., Terraform, Ansible) and proficiency in at least one programming language (e.g., Python, Go, or similar)
  • Cloud Infrastructure: Experience designing, operating, and optimizing cloud-based systems across platforms such as AWS, Azure, or GCP, including scalability, reliability, and cost efficiency
  • CI/CD & Deployment Practices: Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab or similar, ArgoCD), with familiarity in progressive delivery approaches such as canary and blue-green deployments
  • Incident Management & Reliability Operations: Experience with incident response, on-call practices, and leading postmortems, with a focus on continuous improvement and operational excellence
  • SRE Principles & Observability: Strong understanding of SRE best practices, including SLOs, SLIs, and error budgets, along with experience in observability (metrics, logging, and distributed tracing e.g., Prometheus, OpenTelemetry)
  • Collaboration & Communication: Ability to work effectively in a distributed, cross-functional environment, with strong documentation and communication skills
  • Proven experience operating highly available, large-scale distributed systems, with a deep understanding of reliability, scalability, and failure modes
  • Ownership mindset: Takes end-to-end responsibility for system reliability, proactively identifying and addressing risks before they impact users
  • Bias for automation: Continuously seeks to reduce operational toil through automation and scalable solutions
  • Continuous improvement mindset: Actively learns from incidents and drives improvements through blameless postmortems and iterative enhancements
Job Responsibility
Job Responsibility
  • Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets to ensure reliability targets are met
  • Build and enhance observability systems (metrics, logs, and distributed tracing) to enable proactive detection and faster troubleshooting
  • Drive reliability engineering practices, including capacity planning, load testing, and resilience validation (e.g., chaos testing)
  • Improve developer experience (DevEx) by enabling self-service infrastructure and streamlining deployment workflows
  • Partner with engineering team members to embed reliability best practices early in the development lifecycle
  • Design, implement, and optimize CI/CD and GitOps workflows using tools such as GitLab (or similar) and ArgoCD(or similar), enabling automated, reliable deployments with support for progressive delivery strategies like canary and blue-green releases
  • Implement secure-by-default infrastructure and enforce best practices (e.g., IAM, secrets management, encryption)
  • Continuously optimize infrastructure cost and efficiency using FinOps principles while maintaining performance and availability
  • Establish and track operational metrics such as MTTR, MTTD, and incident frequency to drive continuous improvement
  • Reduce operational toil by identifying repetitive work and implementing automation-first solutions
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to su...
Location
Location
United States
Salary
Salary:
113082.00 - 175725.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years experience in an SRE/Operations/DevOps role as part of a team
  • Experience with shell and any scripting language used in an SRE context (Python, Go, Bash, Ruby
  • we primarily use Python) and configuration management tools (Puppet, Ansible
  • we use Puppet)
  • Experience with distributed caching systems: including their underlying algorithms and how to optimize their performance
  • Experience with package management on Linux systems (we use Debian)
  • Strong Linux system-level troubleshooting skills
  • History of automating tasks and processes, identifying process gaps, and finding automation opportunities
  • Strong English language skills (verbal and written) and ability to work independently, as an effective part of a globally distributed team working across multiple time zones
  • Experience leading and participating in incident response and post-incident review rituals, with the goal of conducting root cause analysis and implementing preventive measures
Job Responsibility
Job Responsibility
  • Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure (deployment, maintenance, configuration, troubleshooting)
  • Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes)
  • Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform
  • Working closely with product teams helping them bring scalable functionality to our users by assisting in the architectural design of new services and making them operate at scale
  • Participating in a 24/7 on-call rotation shared across the broader SRE team. This includes taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure.
  • Collaborating with a global, cross-functional team in an asynchronous communication environment
  • Mentoring peers in your areas of technical and operational strength
  • Fulltime
Read More
Arrow Right