This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The rapid adoption of advanced software in vehicles marks a new era for automakers and consumers, bringing both advantages and challenges. As part of Site Reliability Engineering (SRE) database group at General Motors, you'll join a dedicated team focused on enhancing the reliability, efficiency, and scalability of our distributed database systems. We leverage engineering principles to manage operations effectively and build solutions that enable us to grow without sacrificing performance or quality. Our SREs work closely with software development teams, acting as specialists in reliability and production engineering, with a focus on automation, observability, and shared responsibility. We are looking for individuals who are passionate about maintaining the health of our infrastructure while optimizing for reliability and cost-efficiency. This role involves a blend of database engineering and systems engineering skills to keep our services resilient, robust, and scalable. The Role: The database team within the SRE organization is chartered to provide best-in-class Database Management System (DBMS) project solutions to our application partners worldwide. This role involves modernizing our infrastructure and processes to provide database as a service capability into a highly standardized, reliable, and automated environment. The team is responsible for participating in all phases of the application development life cycle while designing, developing, and deploying databases on behalf of the application in a way that ensures GM’s data is secure, highly available, current, flexible, and monitored. This individual will be working on transforming GM applications and database services into modernized cloud offerings.
Job Responsibility:
Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention
Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents
Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution
Work alongside developers to ensure the quality, scalability, and reliability of our database services. Practice shared ownership of services in production, fostering a 'You build it, you run it' culture
Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively
Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence. Champion a culture of continuous improvement
Evaluate system performance and advocate for optimizations that reduce infrastructure costs while maintaining service reliability
Requirements:
Bachelor’s degree in computer science or a related field, or equivalent work experience
7-10 years software experience with strong proficiency in PostgreSQL and at least one other (Oracle, SQL Server) database technologies
Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems
Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures
Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures. Ability to optimize or troubleshoot code by understanding its execution and the impact on system resources
Experience handling production incidents, including root cause analysis, mitigation, and working through complex system failures
Strong communication skills, with an ability to explain technical concepts to both engineering and business stakeholders. Commitment to collaborative problem-solving and shared ownership of services
Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems
Nice to have:
Experience with GIT/source code management, CI/CD development, open-source development
Hands-on experience in Infrastructure as Code tools like Terraform, Terragrunt, Azure Resource Manager (ARM) templates, YAML pipelines, or Bicep
Experience in FiveTran or Goldengate configuration and operation
Experience in Cosmos or other NoSQL technologies
Experience with cloud platforms (AWS, GCP, Azure)
Experience of observability using OpenTelemetry, Prometheus or services such as DataDog
Familiarity with container orchestration systems like Kubernetes
A track record of managing or developing distributed systems