CrawlJobs Logo

Software Engineer, Reliability Platforms

United States, San Francisco Employment contract 159800.00 - 235000.00 USD / Year · Job Posted June 16, 2026
Apply Position
Job Link Share

Job Description

The Reliability Platform role is a key pillar of DoorDash’s Production Lifecycle team, alongside Observability and Deploy Platform. This group’s mandate is to enable users and agents to reason about the health of our services, facilitate change control safety, and provide the means to rapidly address any unexpected state. Ownership is fundamental in DoorDash culture, and all teams own what they build. We are not here to operate services on others’ behalf, but to provide tools that enable their success and ensure a consistently high level of quality for everything we do. We approach challenges with the pragmatic perspective of an SRE, and deliver solutions with the mindset of a SWE who detests toil and repetitive tasks. We use software and agents to “keep the lights on” and focus our energy on innovation that will level up the entire organization. This mission falls into three main categories: Service Health – Providing SLO frameworks, analytics tools, and AI Agent enablement to extract high quality insights from our telemetry to pinpoint faults, or highlight deficiencies; Change Orchestration – Provide self-service provisioning orchestration, evolving from UI to Agent-driven to allow our developers to safely affect production from their IDE; Incident Management – Define and deliver tools/processes/policies leveraged by our peers to quickly understand and recover from any unexpected issues in the environment. As a Software Engineer on the Reliability Platform team, you’ll help design, build, and operate services and infrastructure that deliver on the team’s broad mandate described above.

Job Responsibility

  • Design, build, and operate services and infrastructure that deliver on the team’s broad mandate
  • Deliver innovative capabilities
  • Build great infrastructure
  • Balance practical and possible
  • Be custom obsessed
  • Automate everything
  • Shape the future of operations

Requirements

  • 5+ years of experience in an infrastructure, platform, or backend engineering role
  • Fluent in Go (or a similar language)
  • Comfortable with AWS primitives, security best practices, containerization, and Infrastructure as Code tools like Terraform or Pulumi
  • Understands concepts like SLOs, error budgets, and incident response
  • Platform Engineering Mindset: You think in terms of APIs, abstractions, and workflows
  • Backend Development Skills
  • Cloud/Infra Expertise
  • SRE Experience
  • Flexibility
  • AI Alignment: You embrace the use of AI tools to be a more productive and capable engineer
  • Curiosity About the Future: You’re excited about automation and agentic, AI-assisted operations

What we offer

  • 401(k) plan with employer matching
  • 16 weeks of paid parental leave
  • wellness benefits
  • commuter benefits match
  • paid time off
  • paid sick leave
  • medical, dental, and vision benefits
  • 11 paid holidays
  • disability and basic life insurance
  • family-forming assistance
  • mental health program
  • flexible paid time off/vacation
  • 80 hours of paid sick time per year

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer, Reliability Platforms

8 matching positions

Senior Software Engineer / Principal Software Engineer - Copilot CLI

Within GitHub and Microsoft CoreAI, the Copilot CLI team builds GitHub's coding ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
Job Responsibility
Job Responsibility
  • Take ownership of critical product and platform areas of the Copilot CLI and shared agent runtime
  • Set a high technical and quality bar for agentic systems and developer-facing tooling
  • Design and ship performant, reliable terminal experiences that developers depend on for daily work
  • Use data, benchmarks, and direct user feedback to guide iteration and investment
  • Collaborate across org boundaries to enable other teams to build agentic products on top of a shared foundation
  • Influence architecture, technical direction, and engineering standards beyond your immediate team
What we offer
What we offer
  • Certain roles may be eligible for benefits and other compensation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer and Software Engineer II

OneDrive and SharePoint are rapidly growing services at the center of Microsoft'...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience in related to cloud scale distributed design and patterns
  • The ability to deliver informed designs and plans ahead of production and execution
  • Knowledge of others' expertise and the ability to involve multiple players (within and outside the organization) in the creation or development of novel products, processes, or research streams
Job Responsibility
Job Responsibility
  • Design and deliver systems that enable partners and ISVs to migrate from other cloud providers, improve core systems performance and efficiencies, and ensure zero customer impact throughout the change management cycle
  • Deliver systems to meet our business continuity planning goals, provide telemetry for optimizing the service and drive our response time for detecting and resolving service issues down
  • Create, implement, optimize, debug, refactor, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI)
  • Contribue to the identification of dependencies, and the development of design documents for a product area with little oversight
  • Helps to identify other teams and technologies that will be leveraged, how they will interact, and when one's system may provide support to others
  • Contributes to determining back-end dependencies associated with product, application, service, or platform functionality for product features
  • Understands downstream effects of solutions and work provided
  • Helps to identify areas of dependency and overlap with other teams or team members and drives coordination
  • Remain current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
  • Reviews work items to deepen knowledge of product features in partnership with appropriate stakeholders (e.g., project managers) and executes project plans, release plans, and work items
  • Fulltime
Read More
Arrow Right

Software engineer 2 / Senior Software engineer - Azure Data

Microsoft's Azure Data engineering team is leading the transformation of analyti...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Experience with the Azure stack including Storage, Compute, Networking, Fabric, Purview, Synapse, AKS, DevOps, Data Factory, or Power BI
  • Experience with big data technologies such as Spark, Kafka, Hadoop, or HBase
  • Experience building data lake or data engineering products, tools, or pipelines
  • Familiarity with container-based architectures (Docker, Kubernetes)
  • Ability to debug complex distributed systems on Linux and/or Windows platforms
Job Responsibility
Job Responsibility
  • Write extensible, maintainable code in C#, Java, Scala, or Python for Fabric Materialized Lake View services and HDInsight components
  • Use AI tools and coding best practices across the development lifecycle
  • Design data refresh, scheduling, and query optimisation features with minimal supervision
  • Review code from teammates for correctness, test coverage, security risks, and adherence to team standards
  • Coach junior engineers through code reviews
  • Debug complex issues in distributed systems running on Azure, Linux, and Windows
  • Run live site operations on a rotational, on-call basis
  • Integrate logging and instrumentation to gather telemetry on system health, performance, reliability, and security
  • Work with product managers, technical leads, and partners across geographies to define customer requirements for Materialized Lake View features
  • Fulltime
Read More
Arrow Right

Senior Software Engineer and Principal Software Engineer

We are building a planet-scale multi-modal database and infrastructure for execu...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, or Java
  • OR Equivalent experience
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java
  • OR equivalent experience
  • Experience in shipping products and scalable, reliable services
  • Currently programming/coding in your current or most recent role
  • Hands on experience with asynchronous programming and concurrency (threads, tasks, futures, async/await)
  • Experience with Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS), and/or Google Kubernetes Engine (GKE)
  • Experience in building database engines, query engines, indexing solutions (columnar, full-text, vector), at scale
  • Experience with programming CUDA, AI systems at scale
Job Responsibility
Job Responsibility
  • Independently execute in the face of ambiguity
  • Leads identification of dependencies and the development of design documents for a product, application, service, or platform
  • Writes efficient systems code and able to debug distributed systems
  • Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions
  • Fulltime
Read More
Arrow Right

Software Engineer II and Senior Software Engineer

The FIO (Office Files and Identity) Team drives collaboration, identity, cloud f...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Rust
  • OR equivalent experience
  • 4+ years industry engineering experience coding in languages including, but not limited to, C, C++, C#, Java or Rust
  • 4+ years industry experience building and shipping production quality, performant and cross-platform applications
  • Experience collaborating cross-team and cross-function to deliver software features or projects
  • Platform-specific experience building Windows, MacOS, iOS or Android applications
  • Experience designing and implementing efficient network communications, including network protocols, performance and reliability tooling, modeling and implementing complex customer scenarios across network services
  • Experience designing and implementing client-side storage stacks, with a focus on correctness, resiliency, performance, and adaptability
  • Understanding of client file system design and APIs, including advanced performance optimizations
  • Experience directly implementing large-scale data pipelines for product telemetry, with ability to evolve system as business and technical needs change
Job Responsibility
Job Responsibility
  • Design and write code
  • Work across organizations and directly with partners both in Office and across Microsoft, including other engineers and product leaders
  • Use data as the basis for decision making
  • Be a steward of products that ship to hundreds of millions of customers around the world, staying connected to customers through data and feedback and being agile and responsive to issues
  • Grow as an engineer in a modern and highly impactful team
  • Fulltime
Read More
Arrow Right

Software Reliability Engineer

This role improves and protects software and systems supporting IT services by m...
Location
Location
United States , Atlanta
Salary
Salary:
83900.00 - 151200.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Legally authorized to work in the United States
  • At least 18 years of age
  • Bachelor's Degree plus 2 years of related work experience OR combination of education and experience deemed equivalent (Required)
  • 2-4 years Relevant experience (Preferred)
  • Experience working in an Agile and DevOps environment (Preferred)
  • Experience in one or more of: C, C#, Java, Perl, Python, Go, or scripting experience in Shell and Perl (Preferred)
  • Experience in Continuous Integration/Continuous Delivery tools, such as, Jenkins, Cloudbees, etc., and other automation tools (Preferred)
  • Experience with DevOps tools, such as, Ansible, Chef, Puppet, etc. Experience in Docker, Kubernetes, etc. is preferable (Preferred)
  • Experience in APM tool, like, AppDynamics, logging tool, like Splunk (Preferred)
  • Experience working in a cloud environment (public/private) (Preferred)
Job Responsibility
Job Responsibility
  • Apply DevOps automation tools to manage CI/CD pipelines and configuration for production and non-production environments
  • Perform environment management and automated server provisioning to support scalable infrastructure
  • Deliver software improvements that improve availability, scalability, latency, and efficiency of IT services
  • Create and manage dashboards, alerts, logging standards, and health checks to improve service quality, supportability, and visibility across services
  • Contribute to software delivery process improvements including cloud enablement, containerization, and deployment automation
  • Support cloud-native applications, APIs, microservices, and platform operations across production and non-production environments
  • Troubleshoot production incidents, participate in root cause analysis, and support implementation of long-term reliability improvements with assistance from leadership and senior technical team members
  • Partner with Software Engineering, DevOps, and platform teams to improve application resiliency, scalability, and deployment automation under established technical direction
  • Contribute to operational readiness activities, including release validation, capacity planning, disaster recovery support, and environment support, under the guidance of senior leadership
  • Participate in Agile ceremonies, production support activities, and continuous improvement initiatives
What we offer
What we offer
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Flexible spending account
  • Paid time off
  • Up to 12 paid holidays
  • Fulltime
Read More
Arrow Right

Software Engineer - Reliability

You're a software engineer who enjoys solving complex engineering problems. Your...
Location
Location
United Kingdom , North West
Salary
Salary:
45000.00 - 55000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in software engineering
  • Proficiency in Python, Golang or JavaScript
  • Experience with automation, monitoring and performance solutions
  • Knowledge of cloud-native container environments
  • Understanding of full software development lifecycle
  • Collaboration with platform engineers, developers and operations specialists
Job Responsibility
Job Responsibility
  • Design and build internal tools for development and operations teams
  • Develop automation, monitoring and performance solutions
  • Apply engineering principles to reliability challenges
  • Partner with platform engineers, developers and operations specialists to improve system stability and scalability
What we offer
What we offer
  • Generous pension
  • Holiday
  • Bonus
  • Free gym membership
  • Hybrid working
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Reliability

The AV platform team develops the first layers of software on the GM Autonomous ...
Location
Location
United States , Austin; Mountain View
Salary
Salary:
160200.00 - 290700.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience professional experience with multi-sensor system services and frameworks
  • Bachelors Degree in relevant field or relevant work experience
  • Proven experience writing production software to improve data quality and reliability of safety critical systems including root cause and corrective actions
  • Proficiency with C++11 or later and Python
  • Proficiency in debugging and troubleshooting firmware-related issue
  • Experience driving complex embedded software projects through the full lifecycle of product development
  • Experience architecting and delivering Embedded Systems solutions that support multiple generations of the product
  • Experience engaging in communication at senior management levels and influencing technical strategies
  • Experience applying and mentoring team members on software development best practice
  • Clear and concise written and verbal communication skills
Job Responsibility
Job Responsibility
  • Collaborate with hardware, systems engineering, program management, product management and peer software teams to develop critical reliability software features for the autonomous vehicle
  • Root-cause analysis of complex problems involving multiple cross-functional partners, including hardware and software
  • Identify reliability issue trends, provide clear guidance on reliability requirements, develop reliability design guidelines, and apply lessons learned to enable continuous improvement
  • Design and implement shared infrastructure and tooling among the AV Platform teams to monitor and analyze embedded software and data quality metrics
  • Own the development quality and ensure the solutions are scalable, secure, and optimized for customer experience and performance
  • Partner with cross-functional teams to architect and implement embedded software observability and monitoring solutions
  • Work with the engineering teams to architect and build services to simplify troubleshooting and operational response to incidents and Autonomous Vehicles fleet outages
  • Own technical projects, participate in design reviews and provide input for the reliability section of others’ design reviews
  • Ensure efficiency of the vehicle change process involving embedded software changes and dependencies
  • Participate in on-call rotation
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right