Senior Manager, Hybrid Services & Reliability (SRE) Job at General Motors (Austin, Texas)

Senior Engineer, Hybrid Cloud Fabric

Become a key player in GEICO's tech transformation! We are seeking a Senior or S...

Location

United States , Palo Alto, CA; Dallas, TX; Seattle, WA

Salary:

100000.00 - 215000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Service mesh expertise (dev): familiar with mesh architecture, components, and configuration options, including advanced traffic management, security policies, and telemetry customization
Service mesh experience (ops): designed, implemented, and managed service mesh solutions at scale, addressing challenges related to performance, security, and observability
Programming skills: Experience with Go is a must
Rust is a bonus
Linux OS: In-depth knowledge of Linux operating systems, including performance tuning, troubleshooting, and security best practices
Networking: Advanced understanding of networking concepts and tools (e.g., iptables, netfilter, traffic shaping) for analyzing and optimizing service mesh performance within the hybrid cloud environment
Kubernetes and containerization: Extensive experience with Kubernetes and container orchestration platforms, including networking, security, and service management
Microservices architecture: Deep understanding of microservices design patterns, service discovery mechanisms, API gateways, and distributed tracing
Observability and monitoring: Expertise in tools like Prometheus, Grafana, Jaeger, and Kiali to monitor service mesh performance and troubleshoot issues
Security best practices: Knowledge of zero-trust security principles, authentication and authorization mechanisms, and encryption technologies within the context of service mesh

Job Responsibility

Design and implement a robust service mesh architecture, encompassing traffic management, security, observability, and resilience for microservices across public and private clouds within our on-premises data centers
Integrate the service mesh with existing infrastructure and applications, ensuring seamless operation and interoperability with various platforms and technologies, including legacy systems
Establish and enforce service mesh best practices, including security policies, traffic routing rules, circuit breakers, and access control mechanisms, to maintain a secure and reliable application environment
Develop comprehensive monitoring and observability dashboards to provide deep insights into service mesh health, performance, and potential issues, enabling proactive problem identification and resolution
Guide and mentor engineers on service mesh principles and best practices, fostering knowledge sharing and expertise development within the team, empowering them to contribute effectively to the service mesh implementation
Work closely with networking and security teams to ensure secure and efficient integration of the service mesh with on-premises infrastructure and networks, addressing potential challenges and ensuring smooth operation
Partner with SREs to establish service mesh observability, monitoring, and alerting strategies for maintaining high availability and performance, collaborating to define SLOs, SLIs, and error budgets
Actively engage with the Istio community, contribute to open-source projects, and represent GEICO's leadership in service mesh adoption

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Senior Staff Engineer, Hybrid Cloud Fabric

Become a key player in GEICO's tech transformation! We are seeking a Senior or S...

Location

United States , Palo Alto; Dallas; Chevy Chase; Seattle

Salary:

120000.00 - 260000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Service mesh expertise (dev): familiar with mesh architecture, components, and configuration options, including advanced traffic management, security policies, and telemetry customization
Service mesh experience (ops): designed, implemented, and managed service mesh solutions at scale, addressing challenges related to performance, security, and observability
Programming skills: Experience with Go is a must
Rust is a bonus
Linux OS: In-depth knowledge of Linux operating systems, including performance tuning, troubleshooting, and security best practices
Networking: Advanced understanding of networking concepts and tools (e.g., iptables, netfilter, traffic shaping) for analyzing and optimizing service mesh performance within the hybrid cloud environment
Kubernetes and containerization: Extensive experience with Kubernetes and container orchestration platforms, including networking, security, and service management
Microservices architecture: Deep understanding of microservices design patterns, service discovery mechanisms, API gateways, and distributed tracing
Observability and monitoring: Expertise in tools like Prometheus, Grafana, Jaeger, and Kiali to monitor service mesh performance and troubleshoot issues
Security best practices: Knowledge of zero-trust security principles, authentication and authorization mechanisms, and encryption technologies within the context of service mesh

Job Responsibility

Design and implement a robust service mesh architecture, encompassing traffic management, security, observability, and resilience for microservices across public and private clouds within our on-premises data centers
Integrate the service mesh with existing infrastructure and applications, ensuring seamless operation and interoperability with various platforms and technologies, including legacy systems
Establish and enforce service mesh best practices, including security policies, traffic routing rules, circuit breakers, and access control mechanisms, to maintain a secure and reliable application environment
Develop comprehensive monitoring and observability dashboards to provide deep insights into service mesh health, performance, and potential issues, enabling proactive problem identification and resolution
Guide and mentor engineers on service mesh principles and best practices, fostering knowledge sharing and expertise development within the team, empowering them to contribute effectively to the service mesh implementation
Work closely with networking and security teams to ensure secure and efficient integration of the service mesh with on-premises infrastructure and networks, addressing potential challenges and ensuring smooth operation
Partner with SREs to establish service mesh observability, monitoring, and alerting strategies for maintaining high availability and performance, collaborating to define SLOs, SLIs, and error budgets
Actively engage with the Istio community, contribute to open-source projects, and represent GEICO's leadership in service mesh adoption

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Director, Architect Enterprise Resilience & Recoverability

Location

USA , Bethesda

Salary:

Not provided

Marriott Bonvoy

Expiration Date

June 19, 2026

Requirements

Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related discipline - or equivalent professional experience and certifications
8+ years of progressive experience in systems, infrastructure, cloud, or platform engineering within a large enterprise environment, including: 5+ years specifically in resiliency engineering, disaster recovery, or reliability engineering at scale
Demonstrated experience as a senior technical authority - architect, principal engineer, or technical director - for enterprise resiliency and/or disaster recovery programs and for live recovery events
Proven experience designing and validating end-to-end DR and high-availability architectures for enterprise-scale workloads across cloud (AWS, Azure, GCP, or Alibaba), hybrid, and on-premises environments
Experience aligning technical recovery designs to business recovery objectives (RTO, RPO, business criticality) and translating between business impact and technical implementation
Deep working knowledge of cloud-native resiliency patterns: multi-AZ and multi-region designs, redundancy and fault tolerance, automated failover, dynamic traffic management, and adaptive connectivity
Strong recoverability foundation: backup and restore integrity, immutable and versioned backup, ransomware recovery frameworks, isolated recovery environments, and cross-region recovery patterns
Familiarity with infrastructure-as-code and automation tooling (e.g., Terraform, Ansible, CloudFormation) applied to DR orchestration, validation, and drift detection
Experience with containerized and distributed systems, including Kubernetes, service mesh, and platform-level resiliency patterns
Demonstrated ability to influence and drive accountability across a highly matrixed organization without direct authority - across application, infrastructure, cloud, network, SRE, security, and vendor teams

Job Responsibility

Accountable for the technical strategy, architecture, and engineering execution of resiliency and recoverability across Marriott’s global technology estate - spanning AWS, Azure, Alibaba, hybrid cloud, on-premises, and partner-hosted workloads supporting hundreds of properties worldwide
Own the architectural roadmap for engineered, continuously tested resilience across the most critical revenue-supporting platforms
Serve as the single technical leader unifying resiliency (preventative, design-time) and recoverability (operational, response-time) under a single coherent strategy
Partner with major modernization and consolidation programs to ensure new and migrating platforms are recoverable by design, with repeatable failover and verified transaction success for prioritized critical workloads
Establish and chair architectural standards, production readiness criteria, and resiliency review gates that govern how new and changed systems enter production
Breaks down complex technical problems and drives to the best technical decision based on high level of communication, debate, discussion within the team and with other subject matter experts
Performs research in technologies that are emerging in the industry as a competitive advantage and reports on that research in terms of business opportunities
Advises on viability of emerging technologies for the business
articulates the risks, costs, and ROI
Provides guidance to improve operational processes and procedures to improve service, reduce costs, and leverage technologies

What we offer

401(k) plan
stock purchase plan
discounts at Marriott properties
commuter benefits
employee assistance plan
childcare discounts
medical
dental
vision
health care flexible spending account

Fulltime

!

Senior IAM Automation Engineer

We’re seeking a Senior IAM Automation Engineer to transform how Apex manages wor...

Location

United States , Austin

Salary:

108800.00 - 136000.00 USD / Year

Apex Clearing

Expiration Date

Until further notice

Requirements

7-10+ years in DevOps, SRE, or software engineering roles with significant IAM/identity automation focus
Demonstrated experience building automation solutions for enterprise IAM platforms using APIs, scripting, and infrastructure-as-code
Track record of implementing workflow automation or orchestration platforms in production environments
Understanding of both technical IAM implementations and business processes (joiner/mover/leaver, access requests, compliance)
Experience working in hybrid on-premises and cloud environments
Software development proficiency - 5+ years writing production code (Python, PowerShell, Go, or similar) with strong API and SDK integration experience
IAM architecture skills - Deep understanding of SSO protocols (SAML, OIDC), provisioning standards (SCIM), directory services (Active Directory, Entra ID), and enterprise IAM platforms (Okta strongly preferred)
Infrastructure-as-Code mastery - Hands-on experience with Terraform, Ansible, or similar tools, plus CI/CD pipelines for automated deployments
DevOps/SRE practices - Experience building observable, reliable systems with appropriate monitoring, logging, and incident response capabilities
Workflow automation platforms - Demonstrated ability to implement and govern low-code/code-first automation tools (Tines, Workato, n8n, or similar)

Job Responsibility

Lead Tines platform implementation and governance - Define technical standards, architect RBAC models, and build workflows that automate employee lifecycle management, access requests, and certification campaigns
Build infrastructure-as-code for identity systems - Develop and maintain Terraform, PowerShell, and Python automation across hybrid infrastructure (on-prem AD/Adaxes, Entra ID, Okta, AWS IAM, GCP/GCI) to enable repeatable, version-controlled deployments with proper change management
Design API-driven automation and integrations - Architect scalable solutions that orchestrate identity workflows across HRIS (Workday), ticketing (ServiceNow), collaboration platforms (Slack, Teams, M365), and enterprise applications, leveraging APIs and SDKs to eliminate manual processes
Implement observability and self-healing capabilities - Build monitoring, alerting, and automated remediation for identity systems to reduce operational toil, improve reliability, and enable proactive issue detection across authentication flows and provisioning processes
Enable rapid application onboarding - Create automation frameworks and integration patterns that allow the business to onboard new SaaS applications with minimal manual intervention while maintaining security and compliance standards
Pioneer non-human identity (NHI) governance - Partner with SecOps to develop policies, controls, and automation for managing AI agents, LLM API keys, service accounts, bot identities, and machine-to-machine authentication as AI adoption accelerates across the organization
Mentor and develop junior team members - Share your hard-won experience and technical expertise to elevate the team’s capabilities. Conduct code reviews, pair programming sessions, and knowledge transfer that builds automation skills, IAM expertise, and engineering judgment across the team
Drive technical innovation in the identity space - Evaluate emerging tools and practices, establish CI/CD pipelines for IAM deployments, and leverage AI-powered development tools (LLMs, code generation, AI assistants) responsibly to accelerate automation delivery and stay ahead of business needs

What we offer

Healthcare benefits (medical, dental and vision, EAP)
competitive PTO
401k match
parental leave
HSA contribution match
paid subscription to the Calm app
generous external learning and tuition reimbursement benefits

Fulltime

Senior Staff Engineer, Software

Our Senior Staff Software Engineer works with our Managers, Distinguished and Sr...

Location

United States , Chevy Chase; Palo Alto; Dallas; Seattle

Salary:

120000.00 - 260000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Fluency in at least one modern language (Go/Python preferred)
Understanding of compression algorithms, deduplication, encryption, and error correction
Understanding of SQL and NoSQL databases, including stateful services management and storage
Understanding of networking, caches, key/value stores, load balancing, global load balancing, queues, DNS and CDN
Deep knowledge of SRE practices, methodologies, and principles, along with a solid understanding of on prem and public cloud-based network, compute, and storage technologies
In-depth knowledge of hybrid cloud architecture, IaaS and PaaS technologies, container orchestration platforms (e.g., Kubernetes), cloud efficiency and observability etc.
Strong background in incident management
Ability to create incident response playbooks, runbooks, incident triaging strategies, and post-incident analysis to drive continuous improvement in system reliability and availability
Experience with open-source management and monitoring tools
Experience with infrastructure automation, tooling, and configuration management frameworks (e.g., Puppet, Chef, Ansible, Pulumi, Terraform, etc.)

Job Responsibility

Develop and drive the overall strategy for the Business Continuity and Disaster Recovery (BCDR) organization, aligning it with the organization's business goals and objectives
Provide thought leadership in BCDR, staying ahead of industry trends and emerging technologies to enhance our backup/restore posture
Conduct comprehensive risk assessments to identify potential threats and vulnerabilities
Design and implement robust strategies to ensure data safety, integrity and correctness
Lead the design and architecture of resilient and scalable systems, considering both on-premises and cloud-based solutions
Collaborate with cross-functional teams to integrate data safeguard best practices into the development and deployment processes
Develop and maintain comprehensive incident response plans to address various disaster scenarios on our orchestration and backup/restore systems
Conduct regular simulations and drills to ensure the readiness of the organization in the event of a disaster
Hands-on software engineering and SDLC best practices (Technical Review Documents, Architecture, Software Development, Software Reviews, Testing, Production Readiness Reviews, among others)
Evaluate, select, and implement cutting-edge technologies and tools to enhance our data safeguard capabilities including but not limited to processes, compliance, and visibility

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Senior Staff Engineer – Change Management

GEICO is seeking an experienced Software Engineer who is passionate about buildi...

Location

United States , Chevy Chase; Austin; New York City; Seattle; Palo Alto

Salary:

110000.00 - 260000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Expertise in at least two modern programming languages (Go, Python, Java, C, C++) and object-oriented design
Strong ownership and accountability with excellent communication and collaboration skills
Hands-on experience in incident response, troubleshooting, and root cause analysis
Experience managing distributed systems in public, private, or hybrid cloud environments
Experience with monitoring, logging, and observability tools (Prometheus, Grafana, OpenTelemetry, Loki)
Passion for automation and reducing manual operations using tools like Terraform and Ansible
Familiarity with configuration management and orchestration tools (Helm, Puppet, Spinnaker)
Experience with CI/CD pipelines, Infrastructure as Code (IaC), and cloud-based deployments
Ability to operate in a fast-paced, high-scale environment with a problem-solving mindset
10+ years of professional experience in software development, platform architecture, and infrastructure management

Job Responsibility

Develop and drive the overall strategy for our enterprise Change and Approval Management, aligning it with the organization's business goals and objectives
Lead technical initiatives across multiple teams, providing strategic and technical guidance
Utilize programming languages like Go, Python, Java, and work with SQL/NoSQL databases
Work with container orchestration tools such as Docker, Kubernetes, and OpenStack
Architect and develop cloud-native applications using Azure services
Collaborate with product managers, engineering teams, and stakeholders to solve complex challenges
Ensure the quality, performance, and usability of engineering solutions
Serve as a mentor and thought leader, coaching engineers and influencing executives
Continuously improve processes, adopt best practices, and drive operational efficiency
Support and participate in On Call rotations, respond to incidents, diagnosing production issues, and conducting post-incident reviews to improve system reliability

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Senior Network Engineer

Bumble is seeking a Network Engineer to maintain a stable, predictable, controll...

Location

United Kingdom , London

Salary:

Not provided

Bumble Inc.

Expiration Date

Until further notice

Requirements

3+ years of hands-on Linux systems engineering experience (preferably rpm-based distributions such as RHEL or CentOS)
Strong diagnostic and troubleshooting skills spanning application performance, traffic-delivery issues, and complex multi-layer networking challenges
Deep understanding of networking across L1–L4 and L7, including copper/optics, Ethernet, and static/dynamic routing
Production experience with IS-IS and BGP (OSPF familiarity beneficial)
Extensive hands-on experience with Juniper MX, SRX, and QFX devices
Practical experience implementing and supporting EVPN-VXLAN architectures
Strong background in load balancing (CARP, IPVS, userspace, or enterprise solutions) and packet filtering
Experience building and supporting cloud networking architectures (VPC structures, virtual routing, firewalling, hybrid connectivity, etc.)
Proficiency with 802.1X, 802.1Q, and bonding/teaming at both the server and network hardware layers
Strong diagnostic capabilities with IPv4, ICMP, TCP, UDP, DHCP, and DNS (IPv6 is a plus)

Job Responsibility

Support and evolve Bumble’s global network infrastructure across multiple data centres and offices, including diagnostics of network subsystems within Linux servers (primarily CentOS/RHEL)
Improve network reliability and operational efficiency through configuration management, automation, and continuous optimisation of BAU tasks
Contribute to the design, implementation, and operation of cloud networking as we migrate a significant portion of our workloads into cloud environments
Collaborate closely with Systems Engineering and SRE teams, sharing networking expertise, participating in design reviews, and shaping resilient, secure platform architectures
Manage relationships with global service providers, including IP transit operators, to ensure optimal performance, availability, and accountability
Own IP address management, including subnet allocation, VLAN design, and maintaining accurate documentation
Strengthen Bumble’s security posture by contributing to perimeter defence, segmentation strategy, and proactive threat prevention
Participate in the on-call rota to maintain platform availability and support timely incident response

New

IT Training Lead

The IT Training Lead will drive technology learning and user adoption across the...

Location

United States , Delray Beach

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Experience in IT training, instructional design, technical enablement, or learning and development
Strong knowledge of Microsoft 365
Excellent communication, facilitation, and content development skills
Ability to translate technical concepts into practical, user-friendly training.

Job Responsibility

Design, develop, and deliver IT training programs in instructor-led, virtual, and self-paced formats
Take lead in the Microsoft Copilot and AI training strategy, including onboarding, advanced use cases, responsible AI usage, and ongoing enablement
Partner with IT leadership to support new technology rollouts, system upgrades, and digital transformation initiatives
Create and maintain training content, including videos, guides, tutorials, and job aids
Identify skill gaps and develop targeted learning solutions to improve adoption and productivity
Gather feedback and measure training effectiveness to continuously improve programs.

Select Country

Senior Manager, Hybrid Services & Reliability (SRE)

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?