CrawlJobs Logo

Software Engineer - Reliability

linuxrecruit.co.uk Logo

Linux Recruit

Location Icon

Location:
United Kingdom , North West

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

45000.00 - 55000.00 GBP / Year

Job Description:

You're a software engineer who enjoys solving complex engineering problems. Your default is to engineer reliable software, that will contribute to overall performance of systems. This opportunity offers exactly that challenge. You will sit at the intersection of software engineering and platform reliability, giving you the chance to code solutions that ensure critical systems run smoothly, efficiently and continuously. You will design and build internal tools that help development and operations teams run systems at scale. Using languages such as Python, Golang or JavaScript, you will develop automation, monitoring and performance solutions that reduce manual effort and increase operational efficiency. This is an opportunity to work with modern technologies across the full software development lifecycle, from development through automated pipelines into cloud-native container environments. You will be applying engineering principles to reliability challenges, creating meaningful improvements to systems that must operate at high speed and maintain near-perfect uptime. Collaboration is central to how the teams work. You will partner with platform engineers, developers and operations specialists to solve problems and implement solutions that improve the stability and scalability of the organisation’s most critical systems. This environment encourages engineers to think creatively, contribute ideas and continuously improve the way technology supports the business. In return, you will join a team that values engineering excellence and invests in its people. The role offers a strong benefits package including a generous pension and holiday, a bonus and free gym membership. With hybrid working available, you will also have the flexibility to balance focus time with valuable collaboration across teams. For engineers who want to move beyond writing features and instead build the systems that keep an entire platform running reliably at scale, this role provides the perfect next step.

Job Responsibility:

  • Design and build internal tools for development and operations teams
  • Develop automation, monitoring and performance solutions
  • Apply engineering principles to reliability challenges
  • Partner with platform engineers, developers and operations specialists to improve system stability and scalability

Requirements:

  • Experience in software engineering
  • Proficiency in Python, Golang or JavaScript
  • Experience with automation, monitoring and performance solutions
  • Knowledge of cloud-native container environments
  • Understanding of full software development lifecycle
  • Collaboration with platform engineers, developers and operations specialists
What we offer:
  • Generous pension
  • Holiday
  • Bonus
  • Free gym membership
  • Hybrid working

Additional Information:

Job Posted:
April 24, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer - Reliability

Senior Software Engineer, Site Reliability

Babylist is looking for a Senior Software Engineer, Site Reliability to join our...
Location
Location
United States; Canada
Salary
Salary:
186818.00 - 224183.00 USD; CAD / Year
babylist.com Logo
Babylist
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience as a Site Reliability Engineer or similar role
  • Experience supporting high-traffic consumer-facing websites
  • Proficiency with Terraform
  • Strong experience working with AWS cloud-based infrastructure and services
  • Proficiency with Docker and Kubernetes
  • Solid understanding of cloud-native systems design
  • Troubleshooting and debugging skills
  • Experience designing and supporting CI systems
  • Familiar with monitoring and alerting best practices
  • Proven experience in on-call management best practices
Job Responsibility
Job Responsibility
  • Manage and build our AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform
  • Improve the speed and reliability of our Continuous Integration (CI) systems
  • Provide support to developers in troubleshooting issues
  • Establish, communicate, and support best practices for monitoring and alerting
What we offer
What we offer
  • Company-paid medical, dental, and vision insurance
  • Retirement savings plan with company matching and flexible spending accounts
  • Generous paid parental leave and PTO
  • Remote work stipend
  • Perks for physical, mental, and emotional health, parenting, childcare, and financial planning
  • Fulltime
Read More
Arrow Right

Software Engineer, Site Reliability

As a Site Reliability Engineer (SRE) at Fireworks AI, you will play a critical r...
Location
Location
United States , San Mateo
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems
  • Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems
  • Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services
  • Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
  • Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing
  • Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development
  • In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging
  • Proven ability to troubleshoot complex issues across the entire stack
  • Excellent communication, collaboration, and problem-solving skills
Job Responsibility
Job Responsibility
  • Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure
  • Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability
  • Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance
  • Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management
  • Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization
  • Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence
  • On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts
  • Fulltime
Read More
Arrow Right

MTS Software Architecture - Reliability Engineering

Our team is searching for a Full Stack Member of Technical Staff to collaborate ...
Location
Location
United States , Frisco; Atlanta; Overland Park
Salary
Salary:
145400.00 - 262300.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree Computer Science, engineering or related field of study
  • 9+ years technical engineering experience, including full-stack web development (front-end and back-end)
  • 7+ years or experience in database schema design and writing SQL
  • 3+ years DevOps experience, including infrastructure as code
  • 4+ years hands-on experience with cloud services (AWS, Azure, GCP)
  • 3+ years experience mentoring and coaching team members
  • Expertise in multiple technologies and software stacks
  • Strong understanding of cloud capabilities and how to optimize them for team success
  • Ability to setup a completely new full stack environment from scratch including build steps and backend infrastructure
  • Proficiency in html, css, webpack, JavaScript, at least one front end framework and one backend framework
Job Responsibility
Job Responsibility
  • Imagines, designs and builds full stack web solutions including both the back end and front end
  • Code Review and mentoring of other team members
  • Imagines, designs and builds advanced scheduled jobs and micro-services defining new patterns and orchestrations
  • Imagines, designs and implements advanced data storage mechanisms using relational and non-relational data stores
  • Explores, builds and configures cloud services using infrastructure as code. Recommends new cloud services and patterns
  • Presents ideas which improve an existing system/process/service. Presents new ideas which utilize new frameworks to improve an existing system/process/service
  • Collaborates with team to break down features into user stories and estimate them
  • Awareness of technology roadmap. Updates job knowledge by tracking and understanding emerging engineering practices. Continuously learns, creates content, and teaches others specific subject areas. Informally coaches and contributes to the development of others through mentoring or in house workshops and learning sessions. Coach and develop engineers across functional teams on technology decisions. Influence technology and policy decisions made at Director+ level across organization. Understand financial decisions, including NPV and ROI, based on customer experience/business drivers. Present highly technical concepts to both technical and non-technical decision-makers
  • Provides direction on creation of reliability practices, metrics and tooling based on industry best practices and incident data
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Medical, dental and vision insurance
  • Flexible spending account
  • Employee stock grants
  • Employee stock purchase plan
  • Paid time off
  • Fulltime
Read More
Arrow Right

Sr. Engineer II, Software Engineering FE

At CVS Health, we’re building a world of health around every consumer and surrou...
Location
Location
United States , Chicago
Salary
Salary:
148949.00 - 180000.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or related field
  • six (6) years of progressively responsible, post-baccalaureate experience in a related occupation
  • Experience in building consumer-facing products using any SPA frameworks (React/Vue)
  • Experience in design first approach to software development
  • Experience in writing Jest / Vitest Unit Tests and achieving close to 100% code coverage
  • Experience working in an Agile/Devops environment
Job Responsibility
Job Responsibility
  • Contribute to all aspects of SDLC process (SCRUM, Design, Code, Test, Deploy & Maintain)
  • Collaborate with Product, UX and other Engineering teams
  • Collaborate with Platform team following Architecture best practices for scalability and reliability
  • Contribute to code review process to improve code quality
  • Mentor Engineers
  • Implement SecDevops best practices
  • and other duties as assigned
What we offer
What we offer
  • Full range of medical, dental, and vision benefits
  • 401(k) retirement savings plan
  • Employee Stock Purchase Plan
  • Fully-paid term life insurance plan
  • Short-term and long term disability benefits
  • Well-being programs
  • Education assistance
  • Free development courses
  • CVS store discount
  • Discount programs with participating partners
  • Fulltime
Read More
Arrow Right

Software Engineer II, Android Engineering

As a Software Engineer on Axon’s Robotics team, you’ll be at the forefront of tr...
Location
Location
United States , Boston
Salary
Salary:
120750.00 - 193200.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of industry experience shipping Android applications to the Google Play Store
  • Understand the ins and out of mobile phones
  • expected to lead mobile design reviews as well as the implementation of their designs to release and post-release monitoring
  • Experience with modern architecture (MVVM, MVI, etc) including unit testing
  • Android experience with Retrofit, Coroutines, Okhttp, Hilt, Jetpack Compose
  • Experience working with remote data via REST and JSON
  • Understanding and experience with networking protocols such as TCP, UDP, DHCP, DNS, Server-Sent-Events, Websockets (debugging with Wireshark or Charles a plus)
Job Responsibility
Job Responsibility
  • Lead engineering architecture and design reviews to ensure high standards in software quality
  • Collaborate with the Axon product design team to turn mobile UI designs into functional, engaging solutions
  • Drive the entire mobile software lifecycle, from prototyping to commercialization and post-launch support
  • Interface with cloud services for seamless integration across platforms
  • Set a high technical standard for the team through code and design reviews
  • Partner with Product, Design, and Engineering teams to deliver integrated solutions that meet customer needs
  • Enhance engineering processes, including sprint planning, stand-ups, and long-term planning
  • Build robust and reliable mission critical software that meets high standards for stability in mission-critical applications
  • Collaborate closely with other groups to align on goals, ensuring we deliver impactful and innovative solutions
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Development Programs
  • snacks in our offices
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Location
Location
United States , Ft. Meade
Salary
Salary:
Not provided
cipherlogix.com Logo
CipherLogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fourteen (14) years experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution
  • Ten (10) years experience in system engineering/architecture
  • Ten (10) years experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, CloudBase/Acumulo, Big Table, Cassandra, Scality etc
  • At least ten (10) years experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software automation
  • At least four (4) years experience managing and monitoring large Cloud System (>200 nodes). Cloud Systems Administrator or Developer Certification
  • Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management
  • Ten (10) years experience in the cleared environment
  • Ten (10) years demonstrated experience developing software for one of the following: Windows, UNIX, or Linux OS
  • Knowledge and experience with developing distributed storage routing and querying algorithms
  • Experience in developing documentation required to support a program’s technical issues and training situations
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Compute

Play a key role in building our platform from zero to one. Partner across teams ...
Location
Location
United States
Salary
Salary:
200000.00 - 275000.00 USD / Year
getdbt.com Logo
dbt Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in software engineering, with expertise in database systems, query engines, or storage systems
  • Strong coding skills at the systems level C++, Rust, Go, Python, or Java
  • Experience designing and scaling distributed systems or SaaS platforms
  • Expertise with cloud infrastructure (AWS, GCP, Azure, Kubernetes, Terraform)
  • Proven ability to lead complex projects and collaborate across functions
  • Excellent problem-solving skills, clear communication, and a strong sense of ownership
Job Responsibility
Job Responsibility
  • Design, build, and maintain the Compute layer that powers dbt’s ability to optimize queries across ingestion, transformation, and consumption
  • Lead technical architecture discussions with a focus on query engines, storage systems, and distributed database design
  • Collaborate with Product, Design, Operations, and Security to deliver well-architected, scalable compute solutions
  • Build services, APIs, and experiences that support user delight, quality, high availability, and performance
  • Tackle ambiguous, open-ended technical challenges with strategic thinking, balancing technical constraints with user needs and product goals
  • Define and drive best practices in testing, observability, and system reliability
  • Mentor engineers across the company, fostering technical growth and collaboration
  • Champion a culture of technical excellence and innovation, influencing engineering direction across multiple teams or domains
What we offer
What we offer
  • Unlimited vacation
  • 401k
  • Pension Plan
  • 16 weeks Paid Parental Leave
  • Wellness stipend
  • Home office stipend
  • Equity Stake
  • Fulltime
Read More
Arrow Right

Customer Reliability Engineer

As a Customer Reliability Engineer at Endor Labs on our Customer Success team, y...
Location
Location
United States
Salary
Salary:
Not provided
https://www.endorlabs.com Logo
Endor Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong background in software engineering, with 4 -10 years of deep understanding of programming languages, application security, and DevOps practices
  • Demonstrated experience in developing custom technical solutions and actively engaging in customer-facing roles, with a proven ability to handle project-based work effectively
  • A passionate advocate for customer success, with a focus on building secure, scalable solutions from the ground up
  • Exceptional communication skills, capable of breaking down complex technical topics into clear, understandable terms for a variety of audiences
  • Proactive and anticipatory approach to problem-solving, with the ability to foresee customer needs and craft strategic solutions that align with their overarching goals
Job Responsibility
Job Responsibility
  • Own technical escalations from Customer Success Engineers, Solution Architects and Implementation Engineers ensuring swift reproduction and resolution of critical issues
  • Collaborate with Engineering and Product teams to triage and resolve bugs or architectural issues
  • Provide insight and build closely with our engineering teams, translating customer feedback and troubleshooting insights into tangible product improvements
  • Act promptly when technical issues emerge, applying your advanced troubleshooting skills and understanding of programming and DevOps practices to ensure our customers are successful
  • Conduct deep diagnostics, including logs, APIs, and infrastructure troubleshooting
  • Serve as a bridge between the customer and R&D for complex or systemic issues
  • Document and share solutions for long-term knowledge management and root cause prevention
What we offer
What we offer
  • Competitive salary and comprehensive benefits package including Health, Dental, Vision and Mental Health plans
  • 401(k) plan to support your longterm financial goals
  • Flexible PTO to maintain a healthy work-life balance
  • Opportunities for co-working and team meetups to foster collaboration
  • A dog-friendly office environment for those who love to bring their fur babies along
Read More
Arrow Right