CrawlJobs Logo

Software Engineer SRE

onepay.com Logo

OnePay

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

140000.00 - 180000.00 USD / Year

Job Description:

As a Site Reliability Engineer at OnePay, you will play a critical role in ensuring the stability, scalability, and security of the systems that power our financial products, driving reliability practices across infrastructure, platform, and application teams to support millions of customers.

Job Responsibility:

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
  • Partner closely with product and platform engineering teams to embed reliability best practices in design, development, and deployment processes
  • Lead root cause analysis and postmortems, driving long-term improvements in resiliency and fault tolerance

Requirements:

  • 5+ years of experience as a Software Engineer with a focus on building and running reliable, large-scale, distributed systems in production
  • 5+ years of operational experience in observability tooling and libraries (metrics, logging, tracing) with experience using Datadog or similar tools (Prometheus, Grafana)
  • Proficiency in at least one programming language (Python, Go, Java, or Node.js preferred) for automation and tooling
  • Proficiency in incident management, going on-call, and writing post-mortem reports
  • Excellent collaboration skills with the ability to influence and educate product engineering teams on reliability and observability best practices
  • Hands-on experience with cloud platforms (AWS preferred), container orchestration (Kubernetes), and IAC tools (Terraform, Pulumi)
  • Drive and proactivity – everyone here is a builder and executor
What we offer:
  • Competitive base salary, stock options, and health benefits from Day 1
  • 401(k) plan with company match
  • Remote-friendly (US), flexible time off (FTO), and opportunities for growth
  • A high-growth, mission-driven, inclusive culture where your work has real impact

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer SRE

Intermediate Software Engineer SRE – AI

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more
  • Fulltime
Read More
Arrow Right

Software Engineer

Picture this. You walk into the office on Monday, and by Friday, you have shippe...
Location
Location
United Kingdom , London
Salary
Salary:
80000.00 - 120000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Thrive in fast-paced environments
  • Comfortable with ambiguity
  • Want to join a high-calibre team that is hungry to build
  • Love tackling complex systems
  • Working directly with customers
  • Building software that has an immediate impact
Job Responsibility
Job Responsibility
  • Solve meaningful problems with speed
  • Get real ownership
  • Get real responsibility
What we offer
What we offer
  • Competitive salary
  • Generous equity
  • Front row seat to a company about to take off
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

The InfraOps team’s primary goal is to enable and empower Kiddom’s engineering b...
Location
Location
United States , New York City
Salary
Salary:
160000.00 - 200000.00 USD / Year
kiddom.co Logo
Kiddom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science or a related field
  • 5+ years professional software engineering experience
  • Experience with Java, or Python, Go, Clojure in a production environment
  • Experience designing and building REST APIs
  • Exposure to authorization technologies (OAuth)
  • Experience with continuous integration and automation tools and processes
  • Strong knowledge of design patterns and software engineering best practices
  • Excellent problem solving and debugging skills
  • Strong acumen or exposure to DevOps or SRE methodologies
  • Keen sense for SecOps.
Job Responsibility
Job Responsibility
  • Evangelizing and fostering a healthy DevOps culture here at Kiddom, working with teams to establish best practices and help guide new and existing services.
  • Practicing Infrastructure as Code (IaC) wherever possible, giving us the confidence in repeatable processes that can be automated.
  • Grow our DevOps efforts from small scale to large scale multi-region
  • Share ownership of the entire infrastructure architecture
  • Aim for high availability, high resiliency
  • Support the engineering team with tools to evaluate the performance of their code in production environments, speed up CI/CD pipeline, & feature verification
  • support the engineering team with tools to speed up CI/CD pipeline, feature verification
  • Design and build a scalable, generalized framework for third-party API integrations
  • Leverage existing infrastructure and components to build RESTful web services
  • Build APIs and robust testing environments for internal and external developers
What we offer
What we offer
  • Competitive salary
  • Meaningful equity
  • Health insurance benefits: medical (various PPO/HMO/HSA plans), dental, vision, disability and life insurance
  • One Medical membership (in participating locations)
  • Flexible vacation time policy (subject to internal approval). Average use 4 weeks off per year.
  • 10 paid sick days per year (pro rated depending on start date)
  • Paid holidays
  • Paid bereavement leave
  • Paid family leave after birth/adoption. Minimum of 16 paid weeks for birthing parents, 10 weeks for caretaker parents. Meant to supplement benefits offered by State.
  • Commuter and FSA plans
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Software Engineering Specialist

The role is accountable for ensuring that our technical deliveries realise Busin...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge on Networking domain along with solid understanding on OSS stack of Telecom including, Planning/Monitoring/Assurance
  • Having a strong grip on TMF standards with API based solution and Event based architecture patterns
  • Strong foundation on ODA Architecture patterns
  • Experience designing & solution and from an Engineering point of view with TMF complaint and ODA Architecture
  • Skilled in life cycle management of OSS tools/solutions including requirements analysis, platform selection, technical architecture design, application design & development, testing and deployment
  • Knowledge in various industry standard’s such as TMF, Open API
  • Lead and execute engineering initiatives to ensure the network cloud platform is easily consumable by products and solutions that are built on top of the platform
  • and at the sametime, is compliant with information security standards
  • Implement governance and controls to monitor and manage consumption and compliance with security and other standards
  • Implement and publish APIs for clients to consume platform services in a consistent way
Job Responsibility
Job Responsibility
  • Role implements the defined architectural roadmap for the Assurance Area for the following: Fault Management
  • Resource Management
  • Incident Management
  • Change Management
  • Role involves defining and implementing the roadmap for Transformation of IT, DataCenter and Network Cloud applications in Service and Problem management
  • Manage, Engineer, Architect, Develop and Maintain applications in Network Management, OSS and FCAPS space
  • Fulltime
Read More
Arrow Right

Software Engineer II - Product and Solution Engineering

We are seeking a resourceful, versatile Software Engineer to join our Profession...
Location
Location
India , Chennai
Salary
Salary:
Not provided
arcadia.com Logo
Arcadia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in a software engineering role
  • Good programming skills in one or more of the following programming languages like Python & SQL etc
  • Should be able to write clean code independently
  • Good hands-on work experience with API design.
  • Good hands-on work experience on SQL
  • Ability and internal drive to problem-solve, both creatively and pragmatically
  • Ability to learn new technologies quickly and pick up the domain over a period of time
  • Passion for our mission, sustainability, and drive a clean-energy future
Job Responsibility
Job Responsibility
  • Write integrations, refactor scripts and code to help Arcadia efficiently collect and deliver data
  • Partner with CDI, CS, Product, SRE, InfoSec, Data Engineering and Analytics to deliver data on time with accuracy, quality and meetings SLAs to our customers and enterprise partners
  • Integrate and work with robust, scalable back-end systems, via SQL databases, internal and external APIs
  • Expertise to work with database technologies and query database to retrieve data of interest to our customers and partners
  • Work on security aspects of integration and ensure the safety of our customer data
  • Frequently deploy new functionality to production with a streamlined CI/CD pipeline
  • Explore new technologies with an open-minded team
  • Increase test coverage and reliability and help troubleshoot production issues
  • Collaborate frequently with other engineers
  • Notice and speak up about opportunities to improve experiences to our customers and partners
What we offer
What we offer
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation
  • Fulltime
Read More
Arrow Right

Software Engineer, Site Reliability

As a Site Reliability Engineer (SRE) at Fireworks AI, you will play a critical r...
Location
Location
United States , San Mateo
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems
  • Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems
  • Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services
  • Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
  • Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing
  • Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development
  • In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging
  • Proven ability to troubleshoot complex issues across the entire stack
  • Excellent communication, collaboration, and problem-solving skills
Job Responsibility
Job Responsibility
  • Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure
  • Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability
  • Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance
  • Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management
  • Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization
  • Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence
  • On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts
  • Fulltime
Read More
Arrow Right

Software Engineer

As a Site Reliability Engineer (SRE) you will actively work to improve the perfo...
Location
Location
United States
Salary
Salary:
116700.00 - 187400.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong scripting experience
  • Serious troubleshooting skills across different levels of the stack
  • Engage in capacity planning, demand forecasting, software performance analysis, and systems tuning
  • Experience configuring and managing enterprise monitoring solutions
  • Understanding of Linux systems
  • Building, automating, and maintaining infrastructure in Amazon Web Services
  • Maintaining a high standard of code quality
Job Responsibility
Job Responsibility
  • Improve the performance and reliability of services
  • Address root causes of incidents and reduce incident rates
  • Deep dive into the services we support and own the problem and the corresponding solution
  • Automate away repetitive work
  • Respond to pings, pages, and alerts to investigate issues in our systems
  • Serve in an on-call weekly rotation to make sure our products meet established SLAs
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right