Senior Reliability Engineer Job at Barbaricum (Washington)

Senior Reliability Engineer

Are you looking for a career move that will put you at the heart of a global fin...

Location

Poland , Warsaw

Salary:

241750.00 - 411650.00 PLN / Year

Citi

Expiration Date

Until further notice

Requirements

Software senior developer, scripting in Python/Shell/Bash/Java/Go
Experience in CI/CD, in large enterprise tech-stack infra-architecture
Hands-on experience deploying and scaling the Model Context Protocol (MCP) in complex, enterprise environments
Understanding of SRE/DevOps/CI/CD
Strong analytical, algorithmic, and problem-solving skills
Excellent teamwork, proactive attitude, strong communication skills, both written and oral

Job Responsibility

You will work in an agile software development environment, developing quality and scalable software solutions using leading-edge technologies
You will work closely with developers, engineers and non-technology employees to help them be more productive with the use of the CI/CD tools
You will collaborate with Citi Developer Services engineers to automate manual and repetitive processes, integrate services with AI (by building and maintaining MCP - Model Context Protocol servers), enhance system resiliency, and coordinate service issue investigations by deploying best practices
Automate manual activities, repetitive processes, reporting, controls, etc., configure and tune them
Build and maintain the foundational MCP servers that allows AI models to securely interact with enterprise systems
Continuously improve systems resiliency, reliability, and business cost - through a design and development of software solutions and streamlined processes
Mitigate risk by analyzing the root cause of production issues, impacts to business, and required corrective actions.

What we offer

Employer paid Defined Contribution Pension Plan contribution of 6% of employee’s pensionable earnings (PPE Program)
Employer paid Private Medical Care Package for employees and Private Medical Care Packages for certain family members available at preferential rates
Employer paid Life Insurance Program for employees and Life Insurance for certain family members available at preferential rates
Employee Assistance Program financed by Employer
Paid Parental Leave Program (maternity and paternity leave
statutory and 2 weeks additional paid paternity leave)
Sport Card for employees subsidised via Social Benefits Fund and Sport Cards for certain family members available at preferential rates
Additional benefits from Company’s Social Benefit Fund, in particular: Holidays Allowance, support for sport and cultural activities, team building events
Additional day off for volunteering
Cafeteria/ flex benefit – a company benefits system which enables employees to select and purchase benefits offered by a provider and available for employees on the platform

Fulltime

Senior Reliability Engineer

Be part of Amgen's newest and most advanced drug substance manufacturing plant. ...

Location

United States , Holly Springs

Salary:

123098.00 - 149145.00 USD / Year

Amgen

Expiration Date

Until further notice

Requirements

High School Diploma / GED and 10 years of Engineering experience
Associate’s Degree and 8 years of Engineering experience
Bachelor’s Degree and 4 years of Engineering experience
Master’s Degree and 2 years of Engineering experience
Doctorate Degree

Job Responsibility

Lead all aspects of the delivery and continuous improvement of the Engineering Asset Management (AM), Reliability, Sustainability, and Continuous Improvement (CI) programs at Amgen North Carolina (ANC)
Serve as the primary system owner and subject matter expert for Engineering AM, CI, alarm management, data analytics, and audit readiness programs, acting as a key liaison between ANC Engineering, Global Asset Management, Sustainability, Quality, and Reliability organizations
Deploy and sustain a comprehensive, standardized Reliability Program aligned with corporate Reliability, Sustainability, and Industry 4.0 strategies
Establish and monitor standardized metrics across Manufacturing, Packaging, Laboratories, Maintenance, and Utilities to identify performance gaps, regulatory risks, and major reliability offenders, and to drive data-informed, risk-based improvement plans
Lead Asset Management, Reliability, CI, Sustainability, Alarm Management, and Audit Readiness programs
Establish data-driven reliability frameworks using analytics, dashboards, and KPIs
Lead MMP activities including PM optimization, job plans, and spare parts strategy
Develop risk-based action plans for reliable and sustainable utilities operations
Own alarm management and alarm review programs aligned with regulatory expectations
Drive sustainability initiatives (energy, waste, lifecycle optimization)

What we offer

competitive and comprehensive Total Rewards Plans that are aligned with local industry standards

Fulltime

Senior Reliability Engineer

The incumbent will be responsible to maintain site-wide equipment and facilities...

Location

Singapore , Tuas

Salary:

Not provided

Pfizer

Expiration Date

Until further notice

Requirements

Minimum 6 years for Degree or 15 years for Diploma in Mechanical or Chemical Engineering with relevant experiences in pharmaceuticals, chemicals, or petrochemical industries
Experience with Root Cause Failure Analysis, Equipment Criticality Ranking, PM/PdM optimization, and/or Failure Modes and Effects Analysis
Strong knowledge and understanding of Current Good Manufacturing Practices (part of GxP)
Excellent oral and written communication skills
Working knowledge of MS Excel
Ability to manage complex issues and foster consensus among teams
Familiar with government code of practice, regulations, current Good Manufacturing Practice (cGMP), Good Documentation Practice (GDP) and Data Integrity (DI)
Good Mechanical Maintenance Troubleshooting, Repairs and Analysis Skills
Good Facilitation and Communication skills
Demonstrated problem-solving and relationship management skills

Job Responsibility

Maintain site-wide equipment and facilities, establishing optimization in initative and Right First Time Strategy / Technique to enhance high equipment and instrument reliability in compliance with cGMP, EHS, Data integrity and regulatory requirements in a cost effective manner
Execute the reliability programs of plant Mechanical equipment, instruments and systems by establishing proactive, predictive and preventive maintenance programs, conducting equipment inspections, analyzing data, provide recommendations and follow up monitoring/improvements
Accountable for: cGMP and EH&S compliance
Mechanical System / Equipment failure and Root Cause Analysis
Equipment, instrument and system reliability and performance
Team performance
Maintenance PM work planning and data tracking
Report generation
Implementation, Execution, Commissioning/testing of new and obsolete Mechanical systems and equipment

Fulltime

Senior Site Reliability Engineer - Fleet Reliability

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...

Location

United States , San Francisco

Salary:

230000.00 - 345000.00 USD / Year

Lambda

Expiration Date

Until further notice

Requirements

7+ years of experience in Site Reliability Engineering, DevOps, or a similar role
Strong understanding of modern AI infrastructure, from GPU architectures to hardware performance optimization
Strong understanding of Linux-based systems in a distributed environment
Solid understanding of Python and Go, with experience working with SWE teams to improve internal tooling
Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, SumoLogic)
Proficiency in automation and configuration management tools (e.g., Ansible, Terraform)
Familiarity with cloud platforms (e.g., OCI, AWS, GCP, Azure)
Excellent problem-solving and troubleshooting skills
Strong communication and collaboration skills
Passion for continuous improvement and innovation

Job Responsibility

Define Fleet Health metrics and indicators to objectively measure and improve system availability
Collaborate with the observability team on comprehensive monitoring and alerting systems to proactively predict, detect and respond to issues or anomalies
Create runbooks and automated remediations for common failure scenarios
Build in automation and auditing to ensure compliance and improve efficiency and productivity
Participate in on-call rotations and provide support for incident response and resolution
Implement and integrate logging and metrics across platforms such as Datadog, Prometheus, OpenTelemetry, Grafana, SumoLogic, etc

What we offer

Generous cash & equity compensation
Health, dental, and vision coverage for you and your dependents
Wellness and commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible paid time off plan

Fulltime

Senior Reliability Engineer - AV Labs

We are looking for a hardware focused Senior Reliability Engineer to focus on se...

Location

United States , Sunnyvale

Salary:

180000.00 - 200000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

5+ years of relevant industry experience in software engineering, site reliability, or systems engineering
Experience with modern observability platforms (e.g., Prometheus, Grafana, ELK) in edge, IoT, or hardware-integrated environments
coding skills in one or more of Go, Python, or C++, with experience building and operating production systems
Proficiency in Linux internals and shell scripting for triaging and debugging edge devices or hardware-adjacent systems
Ability to debug across services, containers (Docker), and networking stacks
Proven track record owning reliability, infrastructure, or platform systems for large-scale production workloads
Experience designing and operating observability systems (metrics, logging, alerting, and dashboards)
Experience defining and implementing SLIs and SLOs for system availability or data yield
Deep understanding of networking protocols (TCP/IP, gRPC, or MQTT) and data handling in bandwidth-constrained environments
Experience driving complex technical projects and architectural reviews across multiple teams from design through production

Job Responsibility

Architect Observability Systems: Design and scale an observability platform capable of ingesting and analyzing real-time health telemetry from thousands of distributed vehicle nodes
Build for Edge Constraints: Develop systems that remain performant despite hardware diversity, intermittent connectivity, and rapid fleet scaling
Define Criticality Models: Establish alerting strategies that distinguish transient anomalies from systemic issues impacting sensor uptime and data yield
Detect Complex Failure Modes: Design detection logic for 'silent' failures, such as sensor degradation, compute saturation, or recording pipeline stalls
Scale Through Automation: Design automated detection, triage, and mitigation mechanisms to eliminate manual intervention as the fleet grows
Partner on Mitigation: Collaborate with Operations and Engineering to build safe, automated responses to recurring hardware and software failure scenarios
Drive Operational Efficiency: Build technical interfaces to help Operations surface issues and Engineering diagnose and deploy mitigations rapidly (TTD/TTM)
Lead Technical Strategy: Drive reliability-focused design reviews and translate operational pain points into concrete technical requirements and roadmaps
Uncover Proactive Insights: Apply advanced data analytics to identify latent patterns in fleet telemetry, enabling the proactive detection of systemic regressions and hardware degradation before they impact operations

What we offer

Uber's bonus program
equity award & other types of comp
401(k) plan
various benefits

Fulltime

Senior Engineer, Reliability (Mechanical)

Location

Malaysia , Manjung

Salary:

Not provided

Airswift Sweden

Expiration Date

Until further notice

Requirements

Engineering Background: Preferably Mechanical, Electrical, or related disciplines
Experience: Minimum 8 years (at least 5 years in a relevant reliability or maintenance role)
Industry Exposure: Candidates from O&L, heavy machinery, mining, or cement industries are highly preferred
Strong background in mechanical conveyors and bulk material handling systems
Ability to interpret technical drawings and support simple fabrication needs
Familiarity with steel fabrication, welding, and vibration analysis
Proficient in root cause analysis (RCA), FMEA, LDA, and other reliability tools
Strong data analysis capabilities, especially using CMMS (e.g., SAP)
Coordinate across multiple teams: execution, planning, process inspection
Lead and develop a multi-skilled team, including mechanical and electrical engineers

Job Responsibility

80% strategic focus on equipment lifecycle cost optimization
20% tactical involvement in daily maintenance and reliability operations

Fulltime

Senior Reliability Engineer - PCBA, Harness & Connectors

We are looking for a Senior Reliability Engineer in charge of developing and exe...

Location

United States , San Jose

Salary:

150000.00 - 225000.00 USD / Year

Figure

Expiration Date

Until further notice

Requirements

5+ years of experience in relevant reliability engineering areas
Bachelor's degree or higher in relevant science and engineering fields
Strong knowledge of environmental reliability test principles, models, and methodologies, such as high temperature high humidity, thermal cycle/shock, mechanical vibration/shock
Strong knowledge of industry test standards such as AECQ, JEDEC, IPC standards
Strong knowledge of electrical circuits, PCBA design and relevant SW tools (e.g. Altium)
Strong knowledge of PCBA, harness and connector failure modes, mechanisms, and FA techniques
Hands-on experience on field reliability risk analysis and failure prediction methods
Hands-on experience with Weibull++, JMP, or other reliability statistical analysis software
Hands-on experience on electronic circuit debug and relevant tools, e.g. source meter, oscilloscope
Hands-on experience with 3D CAD tool (e.g. CATIA)

Job Responsibility

Work with cross-functional teams, own hardware reliability requirements and validation strategy
Develop and execute accelerated life tests for PCBAs, electronic components, electrical harness and connectors
Lead DFMEA efforts with design engineers to assess design risks, impacts, controls, and corrective actions
Design reliability test flows and procedures, communicate with internal and external/CM teams to execute tests and report results
Work with test engineers to design setup and fixtures used in reliability testing
Guide and support PCBA, harness, connector failure analysis, design of experiments (DOEs) and corrective action processes with cross-functional teams
Analyze field data, assess field risks, and design tests that correlate to field usage conditions

Fulltime

Senior Site Reliability Engineer

The Senior Site Reliability Engineer establishes and maintains the infrastructur...

Location

United Kingdom; United States; Canada

Salary:

Not provided

Mozilla

Expiration Date

Until further notice

Requirements

7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management
Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi
Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls
Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early
Excellent async written communication skills
comfortable working with a geographically distributed team
Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency
Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes

Job Responsibility

Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives
Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows
Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts
Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design
Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation
Diagnose and debug production incidents
drive root-cause analysis and post-incident improvements to prevent recurring problems
Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding
Contribute to runbooks, architecture documentation, and team processes

What we offer

Fully remote work & schedule flexibility
Company-provided laptop
Annual bonus program
Monthly remote work stipend
Annual professional development stipend
Industry conferences
Company all-hands and team gatherings
24 days PTO per year (prorated)
Birthday
Year-end company shutdown

Fulltime

Select Country

Senior Reliability Engineer

Job Description

Job Responsibility

Requirements

Looking for more opportunities?

Senior Reliability Engineer

Senior Reliability Engineer

Senior Reliability Engineer

Senior Reliability Engineer

Senior Site Reliability Engineer - Fleet Reliability

Senior Reliability Engineer - AV Labs

Senior Engineer, Reliability (Mechanical)

Senior Reliability Engineer - PCBA, Harness & Connectors

Senior Site Reliability Engineer

Our AI answers in your language