CrawlJobs Logo

Staff Software Engineer, Reliability

United States, Austin 160200.00 - 290700.00 USD / Year · Job Posted March 03, 2026
Apply Position
Job Link Share

Job Description

The AV platform team develops the first layers of software on the GM Autonomous Vehicles from working with hardware to moving large amounts of data up the software stack. Within this, the Autonomy Interface SW team develops environmental sensing solutions on multiple vehicle platforms. As a Staff Software Engineers, you are the expert professionals identifying and pursuing new paths of inquiry at GM. As GM’s AV business continues to scale rapidly, building a stable, scalable, flexible, cost-efficient, and reliable foundation is critical. This role will specifically work on the multi-sensor system services and frameworks in collaboration with our partner teams across GM.

Job Responsibility

  • Collaborate with hardware, systems engineering, program management, product management and peer software teams to develop critical reliability software features for the autonomous vehicle
  • Root-cause analysis of complex problems involving multiple cross-functional partners, including hardware and software
  • Identify reliability issue trends, provide clear guidance on reliability requirements, develop reliability design guidelines, and apply lessons learned to enable continuous improvement
  • Design and implement shared infrastructure and tooling among the AV Platform teams to monitor and analyze embedded software and data quality metrics
  • Own the development quality and ensure the solutions are scalable, secure, and optimized for customer experience and performance
  • Partner with cross-functional teams to architect and implement embedded software observability and monitoring solutions
  • Work with the engineering teams to architect and build services to simplify troubleshooting and operational response to incidents and Autonomous Vehicles fleet outages
  • Own technical projects, participate in design reviews and provide input for the reliability section of others’ design reviews
  • Ensure efficiency of the vehicle change process involving embedded software changes and dependencies
  • Participate in on-call rotation
  • High focus on collecting and inferring metric documentation to be used by others to build and maintain system
  • Contribute to the roadmap and software planning activities within the team, helping drive the vision of how the team should evolve
  • Guide and mentor developers on the team

Requirements

  • 6+ years of experience professional experience with multi-sensor system services and frameworks
  • Bachelors Degree in relevant field or relevant work experience
  • Proven experience writing production software to improve data quality and reliability of safety critical systems including root cause and corrective actions
  • Proficiency with C++11 or later and Python
  • Proficiency in debugging and troubleshooting firmware-related issue
  • Experience driving complex embedded software projects through the full lifecycle of product development
  • Experience architecting and delivering Embedded Systems solutions that support multiple generations of the product
  • Experience engaging in communication at senior management levels and influencing technical strategies
  • Experience applying and mentoring team members on software development best practice
  • Clear and concise written and verbal communication skills

Nice to have

  • 8+ years of professional experience with multi-sensor system services and frameworks
  • Experience with safety-critical development (FDA, FAA, Automotive)
  • Familiarity with reliability engineering principles (FMEA, FTA, and other reliability assessment techniques)
  • Knowledge of relevant functional safety industry standards and regulations
  • Experience with different types of sensors and environmental sensing systems
  • Knowledge of embedded software testing methodologies and tools as well as quality assurance processes and methodologies
  • Experience developing on and for Embedded Linux / POSIX systems

What we offer

  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts
  • company vehicle evaluation program
  • incentive pay program
  • relocation benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff Software Engineer, Reliability

8 matching positions

Staff Engineer, Software Reliability Engineering

We are seeking a Staff Engineer to join our dynamic team in Bengaluru, India. In...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
sandisk.com Logo
Sandisk
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in CSE or ECE or EEE, Software Engineering, or related field
  • Master's degree preferred
  • 5 years of software development experience of python scripting and test case development
  • Advanced proficiency in programming languages such as Java, Python, or C++
  • Proficient in version control systems, preferably GitHub
  • Solid understanding of software architecture and design patterns
  • Experience with API development and integration
  • Strong skills in performance optimization and debugging
  • Experience with Agile methodologies and full software development lifecycle
  • Excellent problem-solving and analytical skills
Job Responsibility
Job Responsibility
  • Architect, design, and implement high-performance, scalable test suite for Reliability testing
  • Collaborate with cross-functional teams to define and implement new features and products
  • Lead code reviews and provide mentorship to junior developers
  • Optimize test performance and ensure high-quality, efficient code
  • Troubleshoot and resolve complex technical issues
  • Stay current with emerging technologies and industry trends, recommending improvements to our technology stack
  • Contribute to the development of technical standards and best practices
  • Participate in Agile ceremonies and help drive continuous improvement in our development processes
  • Fulltime
Read More
Arrow Right

Reliability Staff Software Engineer - OpenSearch

We're seeking a skilled Staff Software Engineer with leadership ambition, to joi...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
optimizely.com Logo
Optimizely
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree (Computer Science or engineering preferred) or equivalent work experience
  • Significant experience designing, implementing, and maintaining SaaS with high traffic load
  • Several years of experience directly managing scalable and reliable Elasticsearch and/or Opensearch clusters
  • Experience with TypeScript, JavaScript, C#
  • Experience with GraphQL, REST
  • Experience with Cloudflare workers, Kubernetes
  • Experience with OpenSearch
Job Responsibility
Job Responsibility
  • Architect, implement, and optimize Opensearch indexing and query pipelines for scalability and reliability
  • Design and maintain backup, disaster recovery, and failover strategies for Opensearch clusters
  • Lead root cause analysis and resolution of complex search-related incidents and performance bottlenecks
  • Drive automation for cluster provisioning, upgrades, and configuration management (e.g., with Terraform, Ansible, or Kubernetes)
  • Mentor engineers on Opensearch internals, query optimization, and troubleshooting
  • Collaborate with product and engineering teams to translate business requirements into robust search features
  • Own capacity planning and cost optimization for search infrastructure
  • Author technical documentation and best practices for search development and operations
Read More
Arrow Right

Staff Software Engineer - Site Reliability

Ironclad is the leading AI contracting platform that transforms agreements into ...
Location
Location
United States , San Francisco; New York City
Salary
Salary:
210000.00 - 235000.00 USD / Year
ironcladapp.com Logo
Ironclad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 5 years of experience in a Site Reliability Engineering / DevOps role
  • Expert knowledge of Docker and Kubernetes, Crossplane experience is a plus
  • Strong knowledge of cloud platforms such as AWS and Google Cloud
  • Proficiency in scripting and programming languages like Python, Typescript, or Bash
  • Experience with infrastructure-as-code tools like Terraform or Pulumi
  • Strong troubleshooting and analytical skills, drive to help customers, and the ability to dive deep and learn a new product
  • Experience with CI/CD pipelines and deployment automation tools such as CircleCI and ArgoCD
  • Strong understanding of networking and security principles
Job Responsibility
Job Responsibility
  • Be part of the Cloud Platform SRE Team, focused on building our Cloud Platform using modern tools and best practices
  • Champion SRE best practices within the team and throughout the organization
  • Ensure the reliability, availability, and performance of services and infrastructure
  • Solve the whole problem. Design, implement, and maintain scalable systems
  • Automate repetitive operational tasks to streamline processes
  • Monitor system performance and troubleshoot issues proactively
  • Develop and document best practices for system operations
  • Collaborate with development teams to enhance system design
  • Manage incident responses and perform root cause analysis
  • Participate in on-call rotations to handle critical issues as they arise
What we offer
What we offer
  • 100% health coverage for employees (medical, dental, and vision), and 75% coverage for dependents with buy-up plan options available
  • Market-leading leave policies, including gender-neutral parental leave and compassionate leave
  • Family forming support through Maven for you and your partner
  • Paid time off - take the time you need, when you need it
  • Monthly stipends for wellbeing, hybrid work, and (if applicable) cell phone use
  • Mental health support through Modern Health, including therapy, coaching, and digital tools
  • Pre-tax commuter benefits (US Employees)
  • 401(k) plan with Fidelity with employer match (US Employees)
  • Regular team events to connect, recharge, and have fun
  • And most importantly: the opportunity to help build the company you want to work at
  • Fulltime
Read More
Arrow Right

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...
Location
Location
United States , Santa Clara
Salary
Salary:
126000.00 - 203500.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
  • Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
  • Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
  • Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
  • Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
  • Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
  • Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
  • Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
  • Strong problem-solving skills and ability to work across teams
Job Responsibility
Job Responsibility
  • Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
  • Lead improvements across production systems, including performance, availability, and incident response
  • Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
  • Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
  • Partner with development teams to improve system reliability, observability, and cloud-native design patterns
  • Define and implement monitoring, alerting, and observability strategies across distributed systems
  • Lead incident response efforts, including root cause analysis and long-term remediation strategies
  • Identify and eliminate operational toil through automation and system improvements
  • Mentor engineers and contribute to raising the bar for production engineering practices
What we offer
What we offer
  • restricted stock units
  • bonus
  • Fulltime
Read More
Arrow Right

Staff Software Development Engineer-Automation Engineer

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
106605.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 29, 2026
Flip Icon
Requirements
Requirements
  • Extensive experience in software development and production support for enterprise systems
  • Strong expertise in automation/RPA platforms, scripting, and debugging complex workflows
  • Proven ability to lead incident response and root cause analysis in high-availability environments
  • Deep understanding of SDLC, CI/CD, release management, and production readiness standards
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Serve as the technical owner for production support of automation and RPA solutions across critical business processes
  • Lead incident triage, root cause analysis, and permanent remediation for high-severity automation failures
  • Establish and enforce runbooks, support models, escalation paths, and on-call readiness for automation platforms
  • Proactively identify systemic issues and implement stability, resiliency, and performance improvements
  • Provide hands-on technical leadership for automation design, debugging, and optimization in production environments
  • Review automation code and configurations to ensure adherence to standards, security, and reliability best practices
  • Partner with development teams to ensure production readiness of new automations before release
  • Guide architectural decisions that reduce operational complexity and technical debt
  • Design and maintain monitoring, alerting, and health dashboards for automation platforms
  • Drive adoption of AIOps, SRE, and automation-first support practices where applicable
What we offer
What we offer
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Fulltime
!
Read More
Arrow Right

Staff Software Engineer, Vehicle AI

Work Arrangement: This role is categorized as hybrid. This means the successful ...
Location
Location
United States , Mountain View
Salary
Salary:
189300.00 - 290000.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience. 8+ years of professional software development experience, with a focus on large-scale distributed systems or AI/ML infrastructure. Expert proficiency in one or more programming languages such as Python, C++, Java, or Kotlin. Extensive experience designing, building, and deploying production-grade AI/ML models or intelligent agents. Demonstrated technical leadership in complex projects, including mentoring and driving cross-functional initiatives.
Job Responsibility
Job Responsibility
  • Lead the architecture and implementation of next-generation AI agents, from conceptualization to production deployment. Drive technical direction and strategy for the AI agent platform, ensuring scalability, reliability, and performance. Mentor and guide junior and senior engineers, fostering a culture of technical excellence and best practices. Collaborate with Product Managers and other engineering teams to define requirements and deliver impactful solutions. Conduct complex code reviews, system design reviews, and provide constructive feedback. Identify and address technical debt, performance bottlenecks, and architectural challenges within the agent infrastructure. Stay current with the latest advancements in AI, machine learning, and software engineering to continually improve our technology stack.
What we offer
What we offer
  • Incentive pay program
  • Company vehicle evaluation program
  • Relocation benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer (L4)

As a Staff Engineer on the Twilio Segment Data platform/ pipelines team, you’ll ...
Location
Location
India
Salary
Salary:
Not provided
stytch.com Logo
Stytch
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
  • Hands-on experience with high-scale messaging/streaming systems (several thousand events/sec) and processing engines ( 1M+ events/sec).
  • 8+ years of experience writing production-grade code in a modern programming language
  • Strong theoretical fundamentals and hands-on experience designing and implementing highly available and performant fault-tolerant distributed systems.
  • Experience programming in one or more of the following: Go, Java, Scala, or similar languages
  • Well-versed in concurrent programming, along with a solid grasp of Linux systems and networking concepts.
  • Experience operating large-scale, distributed systems on top of cloud infrastructure such as Amazon Web Services (AWS) or Google Cloud Platform (GCP)
  • Experience in message passing systems (e.g., Kafka, AWS Kinesis) and/or modern stream processing systems (e.g., Spark, Flink).
  • Have hands-on experience with container orchestration frameworks (e.g. Kubernetes, EKS, ECS)
  • Leverage best-in-class development productivity practices including AI tooling.
Job Responsibility
Job Responsibility
  • Design and deliver robust, high-scale routing experiences for the Data platform/ pipelines team for Twilio Segment.
  • Ship features that opt for high availability and throughput with eventual consistency
  • Collaborate with engineering and product leads, as well as teams across Twilio Segment
  • Support the reliability and security of the platform
  • Build and optimize globally available and highly scalable distributed systems
  • Be able to act as a team Tech Lead as needed
  • Mentor other engineers on the team in technical architecture and design
  • Partner with application teams to deliver end to end customer success.
What we offer
What we offer
  • Competitive pay
  • generous time off
  • ample parental and wellness leave
  • healthcare
  • retirement savings program
  • and much more.
Read More
Arrow Right

Staff Software Engineer

We are looking for a Staff Engineer to join the Core Services team under Aurora ...
Location
Location
United States , Mountain View
Salary
Salary:
189000.00 - 303000.00 USD / Year
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7 or more years of experience in building backend services
  • Bachelor’s or Master’s Degree in Computer Science or a related field
  • Experience with web communication protocols, including REST, gRPC and GraphQL
  • Experience with building large scale, high concurrent, high throughput and safety critical backend services
  • Phenomenal communication skills
  • A preference for action
  • The drive to make teams stronger
  • Motivation to own the product lifecycle end to end
Job Responsibility
Job Responsibility
  • Design complex systems from the ground up, partnering closely with Software, Hardware and infrastructure engineering teams
  • Partner with Product Managers, Designers and Operation Stakeholders to deliver the benefit of Self Driving Vehicles quickly, safely and broadly
  • Design, implement and maintain a micro-backend architecture running in the Aurora’s AWS cloud used to monitor and manage the entire Aurora Commercial Fleet
  • Design, implement the fleet management solution and the vehicle communication system
  • Establish technology and infrastructure to scale our products with high availability and reliability
  • Contribute and evolve our team culture around mentorship, feedback, collaboration, and engineering excellence
What we offer
What we offer
  • Annual bonus
  • Equity compensation
  • Benefits
  • Fulltime
Read More
Arrow Right