Senior SRE Manager Job at Aiven Deutschland GmbH (Auckland)

Senior Engineering Manager, SRE

Abridge’s services and engineering teams are in hyperscale mode, and multiplying...

Location

United States , San Francisco; New York; Pittsburgh

Salary:

250000.00 - 290000.00 USD / Year

Abridge

Expiration Date

Until further notice

Requirements

6+ years as a manager in rapidly growing organizations including at least 1 year as a manager of managers
Seeking an extremely challenging role that will push you beyond your limits, where failures are inevitable and not to be feared
Seeking a senior leadership role to develop people, environments, and impact - not ego, accolades, or ladder climbing
Able to ask for help, fail fast and admit defeat
get yourself and others out of their comfort zone
Track record of leading performance engineering including load test and chaos engineering, large scale distributed telemetry implementation, major architectural and software refactors, engineering velocity, and full stack development
Experience running production workloads in more than one cloud provider (at a time, or across your experience)
Experience managing workloads across containerized solutions, Kubernetes, and CNCF-approved tooling such as Argo, istio, OTel, and more
Thought leader in platform building, with a strong desire to represent Abridge as a reliability engineering leader in the tech industry
Genuine passion for Abridge’s mission to improve healthcare in America and across the world

Job Responsibility

Visionary leadership: Scope, resource, evangelize, and execute a company-wide reliability and engineering velocity roadmap across environments and clouds, real-time streaming infrastructure under immense scale, compute as well as AI -at-edge infrastructure, and the most ambitious cloud security roadmap in the entire tech industry
Collaborate with department heads across product engineering, security, product management, commercial, and more to develop, align, and execute an extremely ambitious strategic roadmap
Gifted tactician: Work at the level of small tiger teams to unblock, enable, and drive execution and solutioning
Juggle several ambiguous and tricky problems at a time
Recruiter extraordinaire: Scale out your team to meet this roadmap - both ICs and managers
Attract top talent and hire quickly while maintaining a consistently high bar
Iterate on the hiring process along with other leaders, improve diversity and equity, retain and maximize the effectiveness of an extremely senior team, and make strategic bets on the people that will take us to the next level
Mentor to the mentors: Develop their careers, create top-of-ladder development opportunities, and continuously raise the bar for your staff as well as your peers and leaders in their abilities and awareness
Earn their trust, lead by example, be a doctor rather than a judge for organizational and people challenges, and help establish and maintain a hivemind, de-siloed culture across all engineering pods

What we offer

Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA
Paid Parental Leave: Generous paid parental leave for all full-time employees
Family Forming Benefits: Resources and financial support to help you build your family
401(k) Matching: Contribution matching to help invest in your future
Personal Device Allowance: Tax free funds for personal device usage
Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits
Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more
Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals

Fulltime

Senior Manager, Hybrid Services & Reliability (SRE)

As the Senior Engineering Manager for Hybrid Services & Reliability (HSR) within...

Location

United States , Austin, Texas; Sunnyvale, California

Salary:

201600.00 - 302000.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Extensive background in Site Reliability Engineering (SRE) and defining SLO/SLI frameworks for hybrid cloud environments
Technical proficiency in managing on-prem Linux utilities (DHCP/PXE/NTP) and core development services
Opinionated view on automated observability, incident response, and MTTR reduction
Proven leadership experience

Job Responsibility

Reliability Engineering: Define, measure, and enforce strict SLOs/SLIs for critical hybrid cloud services, including network connectivity and compute readiness
Foundational Utilities: Own and manage core on-prem utilities, such as DHCP, PXE, and CDN, to ensure seamless server auto-provisioning across the global fleet
Environment Integrity: Manage the entire data flow path, from initial ingestion at the test bench through the secure cloud network into production staging
HIL Readiness: Guarantee the 99%+ availability and stability of remote CI-based Hardware-in-the-Loop (HIL) benches required for AV safety validation
Organization Growth: Actively lead the recruitment and technical mentorship of Senior and Staff ICs as part of the team's expansion

What we offer

medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts
relocation benefits

Fulltime

New

Systems Operations Senior Manager

Location

India , Bengaluru

Salary:

Not provided

Wells Fargo

Expiration Date

July 09, 2026

Requirements

7+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of management or leadership experience

Job Responsibility

Manage and develop teams of analysts, associates, and less experienced managers in roles that provide technical services and support for the relevant supported systems
Engage and influence stakeholders, internal partners, and peers in order to engineer projects, identify new products and solutions, and research solutions for existing systems
Identify and recommend opportunities for administration and maintenance of the remote monitoring and management system, as well as the periodic system review
Perform network assessments, security audits, and system enhancement consultations
Determine appropriate strategy and actions of Systems Operations team to meet moderate to high risk deliverables
Interpret and develop policies and procedures, and understand compliance and risk management requirements for supported system area
Provide implementation support for key risk initiatives
Collaborate with and influence all levels of professionals, analysts, or associates
Ensure the Systems Operations team communicates with customers to keep them informed of incident progress, and notify them of impending changes or agreed outages
Manage allocation of people and financial resources for Systems Operations

Fulltime

!

Sr. Manager Sre

We're building a Site Reliability Engineering center in Mexico City, and we're h...

Location

Mexico , Mexico City

Salary:

Not provided

Capital One

Expiration Date

Until further notice

Requirements

Professional English fluency
Bachelor's degree
At least 8+ years of experience in SRE, production operations, or reliability engineering
Experience in DevOps Engineering (internship experience does not apply)
8+ years of experience in at least one of the following: Java, Python, Go
At least 6 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
5+ years of experience with container orchestration services including Docker or Kubernetes
Experience with Shell or Bash scripting
At least 5 years of Unix or Linux system administration experience

Job Responsibility

Define and maintain a 12-18 month technical vision and roadmap for GPN SRE in Mexico City - decompose destination architecture into deliverable steps, sequence investments, and align execution across teams
Drive reliability transformation across settlement, observability, and automation domains - establish SLOs, error budgets, severity frameworks, and operational standards that teams build against
Pioneer AI and agentic automation approaches - design and build AI-driven solutions (using Claude Code, Copilot CLI, and LLM frameworks) for alert classification, runbook generation, automated remediation, and incident analysis
set patterns that other engineers extend
Own the technical strategy for domain-specific knowledge ramp-up: identify which domain expertise requires deep engineering investment vs. documentation, and architect systems that reduce reliance on tribal knowledge
Lead cross-team technical initiatives - drive observability platform convergence, standardize on COF tooling, and eliminate arbitrary uniqueness across towers
Serve as the senior escalation point for complex production incidents - diagnose cascading failures across distributed systems (storage, network, application), drive resolution, and ensure durable fixes land
Architect automation for high-risk operational processes - certificate rotation, compliance artifact generation, settlement cycle validation - ensuring security and reliability are built in from design
Mentor and elevate engineers across teams - conduct design reviews, establish engineering standards, coach on debugging and system thinking, and create an environment where Principal Associates and Managers grow into domain experts
Introduce and advocate for engineering practices that raise the bar - AI engineering, innersourcing, reuse over rebuild, open source contribution, blameless postmortems, and chaos engineering

Fulltime

Systems Operations Senior Manager

Wells Fargo is seeking a Systems Operations Senior Manager to lead production st...

Location

United States , CHARLOTTE

Salary:

Not provided

Wells Fargo

Expiration Date

Until further notice

Requirements

7+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of SRE management or leadership experience
2+ years of experience in a financial crimes production environment

Job Responsibility

Manage and develop teams of analysts, associates, and less experienced managers in roles that provide technical services and support for the relevant supported systems
Engage and influence stakeholders, internal partners, and peers in order to engineer projects, identify new products and solutions, and research solutions for existing systems
Identify and recommend opportunities for administration and maintenance of the remote monitoring and management system, as well as the periodic system review
Perform network assessments, security audits, and system enhancement consultations
Determine appropriate strategy and actions of Systems Operations team to meet moderate to high risk deliverables
Interpret and develop policies and procedures, and understand compliance and risk management requirements for supported system area
Provide implementation support for key risk initiatives
Collaborate with and influence all levels of professionals, analysts, or associates
Ensure the Systems Operations team communicates with customers to keep them informed of incident progress, and notify them of impending changes or agreed outages
Manage allocation of people and financial resources for Systems Operations

Fulltime

Senior Sre – Data & Middleware Observability & Incident Reduction Vice President

The Senior Incident Operations & Optimization Specialist for Data & Middleware i...

Location

United States , Irving

Salary:

125760.00 - 188640.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

A minimum of 8+ years of hands-on experience in database administration, middleware engineering, or enterprise data platform operations
Proven experience in event management, alert tuning, and incident reduction for data and middleware services, with measurable results
Direct, hands-on experience with modern AIOps and event management platforms
Deep knowledge of both relational (e.g., Oracle, SQL Server) and NoSQL (e.g., MongoDB) database technologies, including clustering, replication, and performance tuning
Expertise in middleware platforms, including messaging technologies (e.g., MQ, Kafka) and application servers (e.g., WebSphere, Tomcat)
Hands-on experience developing robust automation solutions using relevant scripting languages (e.g., Python, Shell) and modern automation frameworks
Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms
Excellent analytical abilities with a systematic approach to troubleshooting complex data platform architectures and correlating infrastructure issues with application impact
Exceptional communication skills with the ability to collaborate effectively with DBAs, middleware engineers, and application teams, and to present technical concepts to diverse audiences
Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field

Job Responsibility

Analyze and optimize monitoring across all database and middleware platforms to address high-volume, low-value alerts, identify patterns in incident generation, and determine root causes
Develop and implement domain-specific correlation, de-duplication, and suppression rules on AIOps and event management platforms
Create logic that understands database cluster relationships, messaging dependencies, and application-to-database connections
Architect and develop automation playbooks for incident data enrichment and automated remediation of common database and middleware issues, such as connection pool resets or service restarts
Identify monitoring gaps across the data and middleware landscape, proposing enhancements to ensure comprehensive health monitoring and address blind spots in transactional flows
Partner closely with Database Administration (DBA), middleware engineering, and application teams to validate correlation logic, build consensus on threshold changes, and provide expert guidance on event management best practices
Continuously validate the effectiveness of implemented rules and automation, ensuring critical health indicators remain highly visible
Lead post-implementation reviews and drive iterative improvements

What we offer

medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays

Fulltime

Senior Manager, Software Development

We are seeking a Senior Manager, Software Development to lead the evolution of P...

Location

Lithuania , Kaunas; Vilnius

Salary:

5100.00 EUR / Month

Bentley Systems

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science or Engineering (or equivalent experience)
Proven experience in cloud-native software development, including transforming legacy on-prem solutions
Strong leadership skills with experience managing globally distributed teams
Deep understanding of DevOps/SRE principles, multi-cloud architectures, and complex technical challenges
Results-driven, strategic thinker with excellent communication and collaboration skills
High intellectual curiosity and adaptability to emerging technologies and trends
Coaching mindset with a passion for team development and continuous learning

Job Responsibility

Lead and Inspire: Manage a global development team to deliver highly available, scalable, and performant cloud services
Drive Transformation: Champion the shift to an 'Empowered Teams' model and instill a cloud-first mindset across the organization
Own Delivery Excellence: Ensure code quality, service uptime, and continuous improvement in engineering processes
Strategic Decision-Making: Use data and telemetry to guide product adoption, prioritize investments, and optimize costs
Collaborate Across Functions: Partner with Product Management, UX, and leadership to align on strategy and deliver commitments
Build Talent: Recruit, coach, and develop top engineering talent to create a world-class development organization

What we offer

A Great Team & Culture
Global Impact
Supportive Environment
Career Growth: Access to training programs, certifications, and industry conferences
Challenging & Meaningful Work
Purpose-Driven Mission
Flexible Work Options: Choose between office-based or remote work (offices in Vilnius and Kaunas)
Work-Life Balance: Additional annual leave days and extra paid time off for special occasions (marriage, moving day, bereavement, etc.)
Comprehensive Benefits: Health insurance and 24/7 accident coverage
Recognition & Rewards: Referral bonuses starting at €1500 gross, seniority bonuses, and colleague recognition awards

Fulltime

Senior Manager of Engineering

Docker seeks a Senior Manager of Engineering to build and lead a new AI Develope...

Location

United States , Seattle

Salary:

226600.00 - 318500.00 USD / Year

Docker

Expiration Date

Until further notice

Requirements

5+ years managing high-performing engineering teams, with demonstrated experience hiring, developing, and retaining diverse technical talent
experience building teams from scratch highly valued
5+ years as a software developer with hands-on experience building developer tools, platform engineering systems, DevOps, or SRE infrastructure
Strong understanding of AI/ML technologies, LLM integration patterns, and practical applications of AI in developer workflows
hands-on experience building AI-powered tools or agents preferred
Track record of building platforms or internal tools that enable other teams and measurably improve developer productivity
Deep technical knowledge of modern cloud-native infrastructure including Kubernetes, GitOps deployment patterns, observability systems, and CI/CD pipelines
Experience with infrastructure-as-code frameworks (Terraform, Pulumi) and cloud platforms (AWS, GCP, Azure)
Product mindset with ability to envision how internal tools can become commercial offerings
experience with productization of internal platforms a plus

Job Responsibility

Build and Scale the AI Developer Tools Team
Ship AI-Powered Developer Tools
Build Self-Service AI Developer Tools Platform
Drive Platform Adoption and Developer Experience
Partner on AI Strategy and Technology
Explore Productization Opportunities
Deliver Measurable Impact
Cross-Functional Collaboration
Operational Excellence
Team Development and Culture

What we offer

Freedom & flexibility
fit your work around your life
Designated quarterly Whaleness Days plus end of year Whaleness break
Home office setup
16 weeks of paid Parental leave
Technology stipend equivalent to $100 net/month
PTO plan
Training stipend for conferences, courses and classes
Equity
Docker Swag

Fulltime

Select Country

Senior SRE Manager

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?