Staff Software Engineer, Reliability Job at General Motors (Austin)

Staff Engineer, Software Reliability Engineering

We are seeking a Staff Engineer to join our dynamic team in Bengaluru, India. In...

Location

India , Bengaluru

Salary:

Not provided

Sandisk

Expiration Date

Until further notice

Requirements

Bachelor's degree in CSE or ECE or EEE, Software Engineering, or related field
Master's degree preferred
5 years of software development experience of python scripting and test case development
Advanced proficiency in programming languages such as Java, Python, or C++
Proficient in version control systems, preferably GitHub
Solid understanding of software architecture and design patterns
Experience with API development and integration
Strong skills in performance optimization and debugging
Experience with Agile methodologies and full software development lifecycle
Excellent problem-solving and analytical skills

Job Responsibility

Architect, design, and implement high-performance, scalable test suite for Reliability testing
Collaborate with cross-functional teams to define and implement new features and products
Lead code reviews and provide mentorship to junior developers
Optimize test performance and ensure high-quality, efficient code
Troubleshoot and resolve complex technical issues
Stay current with emerging technologies and industry trends, recommending improvements to our technology stack
Contribute to the development of technical standards and best practices
Participate in Agile ceremonies and help drive continuous improvement in our development processes

Fulltime

Reliability Staff Software Engineer - OpenSearch

We're seeking a skilled Staff Software Engineer with leadership ambition, to joi...

Location

United Kingdom , London

Salary:

Not provided

Optimizely

Expiration Date

Until further notice

Requirements

Bachelor’s Degree (Computer Science or engineering preferred) or equivalent work experience
Significant experience designing, implementing, and maintaining SaaS with high traffic load
Several years of experience directly managing scalable and reliable Elasticsearch and/or Opensearch clusters
Experience with TypeScript, JavaScript, C#
Experience with GraphQL, REST
Experience with Cloudflare workers, Kubernetes
Experience with OpenSearch

Job Responsibility

Architect, implement, and optimize Opensearch indexing and query pipelines for scalability and reliability
Design and maintain backup, disaster recovery, and failover strategies for Opensearch clusters
Lead root cause analysis and resolution of complex search-related incidents and performance bottlenecks
Drive automation for cluster provisioning, upgrades, and configuration management (e.g., with Terraform, Ansible, or Kubernetes)
Mentor engineers on Opensearch internals, query optimization, and troubleshooting
Collaborate with product and engineering teams to translate business requirements into robust search features
Own capacity planning and cost optimization for search infrastructure
Author technical documentation and best practices for search development and operations

Staff Software Engineer - Site Reliability

Ironclad is the leading AI contracting platform that transforms agreements into ...

Location

United States , San Francisco; New York City

Salary:

210000.00 - 235000.00 USD / Year

Ironclad

Expiration Date

Until further notice

Requirements

Minimum of 5 years of experience in a Site Reliability Engineering / DevOps role
Expert knowledge of Docker and Kubernetes, Crossplane experience is a plus
Strong knowledge of cloud platforms such as AWS and Google Cloud
Proficiency in scripting and programming languages like Python, Typescript, or Bash
Experience with infrastructure-as-code tools like Terraform or Pulumi
Strong troubleshooting and analytical skills, drive to help customers, and the ability to dive deep and learn a new product
Experience with CI/CD pipelines and deployment automation tools such as CircleCI and ArgoCD
Strong understanding of networking and security principles

Job Responsibility

Be part of the Cloud Platform SRE Team, focused on building our Cloud Platform using modern tools and best practices
Champion SRE best practices within the team and throughout the organization
Ensure the reliability, availability, and performance of services and infrastructure
Solve the whole problem. Design, implement, and maintain scalable systems
Automate repetitive operational tasks to streamline processes
Monitor system performance and troubleshoot issues proactively
Develop and document best practices for system operations
Collaborate with development teams to enhance system design
Manage incident responses and perform root cause analysis
Participate in on-call rotations to handle critical issues as they arise

What we offer

100% health coverage for employees (medical, dental, and vision), and 75% coverage for dependents with buy-up plan options available
Market-leading leave policies, including gender-neutral parental leave and compassionate leave
Family forming support through Maven for you and your partner
Paid time off - take the time you need, when you need it
Monthly stipends for wellbeing, hybrid work, and (if applicable) cell phone use
Mental health support through Modern Health, including therapy, coaching, and digital tools
Pre-tax commuter benefits (US Employees)
401(k) plan with Fidelity with employer match (US Employees)
Regular team events to connect, recharge, and have fun
And most importantly: the opportunity to help build the company you want to work at

Fulltime

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...

Location

United States , Santa Clara

Salary:

126000.00 - 203500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
Strong problem-solving skills and ability to work across teams

Job Responsibility

Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
Lead improvements across production systems, including performance, availability, and incident response
Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
Partner with development teams to improve system reliability, observability, and cloud-native design patterns
Define and implement monitoring, alerting, and observability strategies across distributed systems
Lead incident response efforts, including root cause analysis and long-term remediation strategies
Identify and eliminate operational toil through automation and system improvements
Mentor engineers and contribute to raising the bar for production engineering practices

What we offer

restricted stock units
bonus

Fulltime

Staff Software Development Engineer-Automation Engineer

We’re building a world of health around every individual — shaping a more connec...

Location

United States

Salary:

106605.00 USD / Year

CVS Health

Expiration Date

June 29, 2026

Requirements

Extensive experience in software development and production support for enterprise systems
Strong expertise in automation/RPA platforms, scripting, and debugging complex workflows
Proven ability to lead incident response and root cause analysis in high-availability environments
Deep understanding of SDLC, CI/CD, release management, and production readiness standards
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

Job Responsibility

Serve as the technical owner for production support of automation and RPA solutions across critical business processes
Lead incident triage, root cause analysis, and permanent remediation for high-severity automation failures
Establish and enforce runbooks, support models, escalation paths, and on-call readiness for automation platforms
Proactively identify systemic issues and implement stability, resiliency, and performance improvements
Provide hands-on technical leadership for automation design, debugging, and optimization in production environments
Review automation code and configurations to ensure adherence to standards, security, and reliability best practices
Partner with development teams to ensure production readiness of new automations before release
Guide architectural decisions that reduce operational complexity and technical debt
Design and maintain monitoring, alerting, and health dashboards for automation platforms
Drive adoption of AIOps, SRE, and automation-first support practices where applicable

What we offer

Medical, dental, and vision coverage
Paid time off
Retirement savings options
Wellness programs

Fulltime

!

Staff Software Engineer, Vehicle AI

Work Arrangement: This role is categorized as hybrid. This means the successful ...

Location

United States , Mountain View

Salary:

189300.00 - 290000.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, related technical field, or equivalent practical experience. 8+ years of professional software development experience, with a focus on large-scale distributed systems or AI/ML infrastructure. Expert proficiency in one or more programming languages such as Python, C++, Java, or Kotlin. Extensive experience designing, building, and deploying production-grade AI/ML models or intelligent agents. Demonstrated technical leadership in complex projects, including mentoring and driving cross-functional initiatives.

Job Responsibility

Lead the architecture and implementation of next-generation AI agents, from conceptualization to production deployment. Drive technical direction and strategy for the AI agent platform, ensuring scalability, reliability, and performance. Mentor and guide junior and senior engineers, fostering a culture of technical excellence and best practices. Collaborate with Product Managers and other engineering teams to define requirements and deliver impactful solutions. Conduct complex code reviews, system design reviews, and provide constructive feedback. Identify and address technical debt, performance bottlenecks, and architectural challenges within the agent infrastructure. Stay current with the latest advancements in AI, machine learning, and software engineering to continually improve our technology stack.

What we offer

Incentive pay program
Company vehicle evaluation program
Relocation benefits

Fulltime

Staff Software Engineer (L4)

As a Staff Engineer on the Twilio Segment Data platform/ pipelines team, you’ll ...

Location

India

Salary:

Not provided

Stytch

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
Hands-on experience with high-scale messaging/streaming systems (several thousand events/sec) and processing engines ( 1M+ events/sec).
8+ years of experience writing production-grade code in a modern programming language
Strong theoretical fundamentals and hands-on experience designing and implementing highly available and performant fault-tolerant distributed systems.
Experience programming in one or more of the following: Go, Java, Scala, or similar languages
Well-versed in concurrent programming, along with a solid grasp of Linux systems and networking concepts.
Experience operating large-scale, distributed systems on top of cloud infrastructure such as Amazon Web Services (AWS) or Google Cloud Platform (GCP)
Experience in message passing systems (e.g., Kafka, AWS Kinesis) and/or modern stream processing systems (e.g., Spark, Flink).
Have hands-on experience with container orchestration frameworks (e.g. Kubernetes, EKS, ECS)
Leverage best-in-class development productivity practices including AI tooling.

Job Responsibility

Design and deliver robust, high-scale routing experiences for the Data platform/ pipelines team for Twilio Segment.
Ship features that opt for high availability and throughput with eventual consistency
Collaborate with engineering and product leads, as well as teams across Twilio Segment
Support the reliability and security of the platform
Build and optimize globally available and highly scalable distributed systems
Be able to act as a team Tech Lead as needed
Mentor other engineers on the team in technical architecture and design
Partner with application teams to deliver end to end customer success.

What we offer

Competitive pay
generous time off
ample parental and wellness leave
healthcare
retirement savings program
and much more.