CrawlJobs Logo

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

capitalone.com Logo

Capital One

Location Icon

Location:
United States , New York, New York

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

179400.00 - 245600.00 USD / Year

Job Description:

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering). Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers, doers and disruptors, who love to solve real problems and meet real customer needs. We are seeking experienced DevOps Engineers who are passionate about platform engineering to join our team. As an Engineer on our team, you’ll have the opportunity to be on the forefront of driving a major transformation within Capital One. Our mission is to build a platform that developers love to use, even as we scale to support a massive multi-tenant environment. You will play a critical role in reducing cognitive load for the wider organization by codifying operational knowledge into custom Kubernetes Operators and automation tools written in Go. By joining this team, you are effectively building the 'internal product' that powers the company, turning the friction of navigating a large fleet of clusters into a streamlined, automated experience that accelerates innovation.

Job Responsibility:

  • Lead a portfolio of diverse technology projects with deep experience in platform engineering, machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of customers achieve financial empowerment
  • Utilize programming languages like Python, and Golang, along with container orchestration tools including Docker and Kubernetes, configuration management tools including Ansible and Terraform, and a variety of AWS tools and services

Requirements:

  • Bachelor’s degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience

Nice to have:

  • 7+ years of DevOps and Platform Engineering experience
  • 4+ years of experience with coding and scripting (Python, SQL, Java, JavaScript, Golang, Bash, Perl or Ruby)
  • 4+ years of experience with cloud orchestration tooling and related technologies Kubernetes, Helm, ArgoCD, Crossplane, AWS ACK
  • 4+ years of experience using build and deployment tools (Jenkins, Docker)
  • 2+ years of experience with deploying clustered web services
  • 2+ years of experience with building custom Kubernetes operators and controllers using frameworks like Kubebuilder
  • 2+ years of experience with Kubernetes CNI and Mesh tooling like Istio, Cilium or Linkerd
  • 2+years of experience in Agile practices
What we offer:
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being

Additional Information:

Job Posted:
March 25, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Engineering Lead Analyst

Engineering Lead Analyst position in Citi's Cloud Technology Services (CTS) team...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12 Plus years of relevant experience in an Engineering role
  • Deep understanding of public cloud services adoption at scale
  • Expert-level understanding of AWS/GCP Cloud Network across Internet Application Hosting, B2B Connectivity, and Application Resiliency
  • Infrastructure as Code (IaC) Hands On Expertise with Python and Go
  • CI/CD experience with Terraform, Harness, Tekton, Jenkins, etc.
  • Testing Automation experience with Terratest, Cucumber, PytestBD, AWS Fault Injection Simulator (FIS), Chaos Mesh, etc.
  • Familiarity with Agile Development, DevOps, and SRE practices
  • Demonstrated ability to quickly learn new technologies and adapt to changing project requirements
  • Experience evaluating complex requirements and rationalizing them into consistent service offering
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Technical Expertise: hands-on technical contribution within product team focused on public cloud network
  • Collaborative Development: contribute to team of cloud engineers and full-stack software developers
  • Automation: Identify and develop automation initiatives to improve processes related to public cloud services consumption
  • Cross-Functional Partnership: collaborate with teams across Citi's technology landscape
  • Engineering Excellence: contribute to defining and measuring success criteria for service availability and reliability
  • Compliance Advocacy: ensure adherence to relevant standards, policies, and regulations
  • Serve as technology subject matter expert for internal and external stakeholders
  • Provide direction for firm mandated controls and compliance initiatives
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
What we offer
What we offer
  • Career growth opportunities
  • Opportunity to give back to community
  • Make real impact
  • Global team environment
  • Well-being support
  • Work-life balance programs
  • Fulltime
Read More
Arrow Right

Associate Head - Software Engineering

Alter Domus India develops and licenses a growing family of proprietary software...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
alterdomus.com Logo
Alter Domus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science or a related field (or equivalent work experience)
  • Seasoned engineering senior manager with minimum 14+ years of experience managing a team and global stakeholders
  • Strong professional experience in full stack development, with a strong focus on Angular, .NET, and .NET Core
  • Very strong expertise in developing and integrating RESTful APIs, with a deep understanding of asynchronous request handling
  • Strong understanding of technology architectures, programming, databases, and cloud computing
  • Cloud platform-agnostic skills are preferred, enabling flexibility in technology selection
  • Excellent leadership, communication, and interpersonal skills to effectively manage teams and collaborate with stakeholders
  • Ability to identify problems, analyze data, and develop effective solutions that meet business needs
  • Proven experience in managing multiple projects simultaneously, overseeing implementation, and ensuring successful delivery
  • Ability to think strategically, develop long-term plans, and make decisions that align with business objectives
Job Responsibility
Job Responsibility
  • Develop and implement technology transformation strategies that align with business goals
  • Identify areas for improvement and propose innovative technologies to enhance operational efficiency
  • Design and oversee the implementation of new architectures across application, data, integration, and security domains
  • Lead the design and delivery of technology solutions that meet business needs and adhere to industry standards
  • Collaborate with cross-functional teams and clients to understand requirements and translate them into effective technical solutions
  • Evaluate and recommend new technologies, tools, and platforms to support business transformation efforts
  • Promote the culture of continuous improvement, innovation and upskilling in the team
  • Oversee the implementation of new technologies and solutions, managing project timelines and budgets to ensure successful delivery across multiple projects simultaneously
  • Continuously monitor and optimize technology performance, identifying areas for improvement and implementing strategies to enhance efficiency
  • Provide mentorship and guidance to junior engineers and team members
What we offer
What we offer
  • Support for professional accreditations such as ACCA and study leave
  • Flexible arrangements, generous holidays, birthday leave
  • Continuous mentoring along your career progression
  • Active sports, events and social committees across our offices
  • Support with mental, physical, emotional and financial support 24/7 from our Employee Assistance Program
  • The opportunity to invest in our growth and success through our Employee Share Plan
  • Plus additional local benefits depending on your location
Read More
Arrow Right

Software Engineer Sr Staff - Platforms Developer

Designs, develops, troubleshoots and debugs software programs for software enhan...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or master’s degree in computer science, electronics, telecommunication engineering, or a related discipline
  • 14 to 19 years of experience in networking and system software development
  • Proficiency in C and C++ programming
  • Familiarity with data structures and system debugging techniques
  • Expertise in Host Complex, System Peripherals & Drivers: CPU complex (x86)
  • PCIe, SPI, I2C, MDIO
  • FPGA, CPLD, Flash Drivers
  • Expertise in Ethernet Interfaces (ranging from 1Gig to 400G+, including 800G, 1.6T), MacSec, Timing, Optics (SFP, QSFP, QDD, OSFP)
  • Expertise in High-speed packet forwarding with network processors, PHYs, and SerDes
  • Cloud Architectures
Job Responsibility
Job Responsibility
  • Collaborate with product managers, architects, and other engineers to define software requirements and specifications
  • Design, implement, and maintain networking and system software components using C and C++ programming languages
  • Conduct object-oriented analysis and design to ensure robust and scalable solutions
  • Debug complex system-level issues, leveraging your deep understanding of fundamental OS concepts (especially in Linux or similar operating systems)
  • Participate in hardware and system-level design discussions, ensuring carrier-class software development
  • Work with Linux device drivers, system bring-up, and the Linux kernel
  • Navigate large codebases effectively
  • Apply strong technical, analytical, and problem-solving skills to enhance software performance and resilience
  • Utilize scripting technologies and modern DevOps practices
  • Collaborate with cross-functional teams, including networking, embedded platform software, and hardware experts
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Corporate Tools is looking for a Site Reliability Engineer. You will be a tradit...
Location
Location
United States
Salary
Salary:
175000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience
  • 5+ years of experience in software engineering
  • 2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles
  • Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi
  • Strong proficiency with Kubernetes, Docker, and container orchestration in production environments
  • Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic
  • Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts
  • Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached)
  • Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement
  • Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis
Job Responsibility
Job Responsibility
  • Stop problems before they start
  • Fix issues quickly and learn from them
  • Help keep systems steady, secure, and running
  • Work closely with DevOps engineers to build out tools and automation
  • Take ownership
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days
  • Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Open concept office with friendly coworkers
  • Creative environment where you can make a difference
  • Fulltime
Read More
Arrow Right

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering). Do you...
Location
Location
United States , McLean; Plano; Richmond
Salary
Salary:
179400.00 - 225100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
  • Utilize programming languages like Java, Python, SQL, Ruby and Go, Container Orchestration services including Docker and Kubernetes, CM tools including Ansible and Terraform, and a variety of AWS tools and services
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Founding Engineering Manager

Our company is seeking an experienced Software Engineering Manager to lead and e...
Location
Location
United States , Brentwood
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of detail oriented software engineering experience
  • 3+ years leading or managing software engineering teams
  • Proven record in building, scaling, or transforming engineering organizations
  • Hands-on experience owning and supporting production systems
  • Deep understanding of modern software development and operational practices (e.g., Git, CI/CD, DevOps, cloud infrastructure, software lifecycle)
  • Strong leadership, communication, and organizational capabilities
  • Experience partnering with cross-functional teams and business stakeholders
  • Ability to commute to Long Island, NY under a hybrid work model
Job Responsibility
Job Responsibility
  • Engineering Organization Leadership Establish and oversee the engineering organization, including hiring, mentoring, and managing a multi-disciplinary team
  • Cultivate a culture based on ownership, accountability, operational excellence, and continual improvement
  • Develop and refine engineering processes, organizational structure, and technical best practices
  • Align engineering objectives with business strategy in partnership with senior leaders and stakeholders
  • Production Reliability & Operational Ownership Champion the reliability and stability of production systems supporting critical operations
  • Implement and manage support processes, including incident response, on-call rotations, and root-cause analysis
  • Develop and execute operational excellence programs emphasizing monitoring, scalability, and performance optimization
  • Foster a sense of engineering accountability for production system continuity and resilience
  • Global Engineering Collaboration Drive effective coordination and communication with distributed engineering teams in a global technology environment
  • Synchronize development and operational efforts across regions, ensuring alignment and best practices
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan
Read More
Arrow Right

Founding Engineering Manager

Our company is seeking an experienced Software Engineering Manager to lead and e...
Location
Location
United States , Edgewood, NY
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of detail oriented software engineering experience
  • 3+ years leading or managing software engineering teams
  • Proven record in building, scaling, or transforming engineering organizations
  • Hands-on experience owning and supporting production systems
  • Deep understanding of modern software development and operational practices (e.g., Git, CI/CD, DevOps, cloud infrastructure, software lifecycle)
  • Strong leadership, communication, and organizational capabilities
  • Experience partnering with cross-functional teams and business stakeholders
  • Ability to commute to Long Island, NY under a hybrid work model
Job Responsibility
Job Responsibility
  • Engineering Organization Leadership Establish and oversee the engineering organization, including hiring, mentoring, and managing a multi-disciplinary team
  • Cultivate a culture based on ownership, accountability, operational excellence, and continual improvement
  • Develop and refine engineering processes, organizational structure, and technical best practices
  • Align engineering objectives with business strategy in partnership with senior leaders and stakeholders
  • Production Reliability & Operational Ownership Champion the reliability and stability of production systems supporting critical operations
  • Implement and manage support processes, including incident response, on-call rotations, and root-cause analysis
  • Develop and execute operational excellence programs emphasizing monitoring, scalability, and performance optimization
  • Foster a sense of engineering accountability for production system continuity and resilience
  • Global Engineering Collaboration Drive effective coordination and communication with distributed engineering teams in a global technology environment
  • Synchronize development and operational efforts across regions, ensuring alignment and best practices
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • eligible to enroll in our company 401(k) plan
Read More
Arrow Right

Sr sre

Location
Location
India , Putlibowli
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
March 30, 2026
Flip Icon
Requirements
Requirements
  • Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, Dynatrace
  • Build and manage CI/CD pipelines
  • Improve infrastructure provisioning and configuration through automation
  • Monitor the health, performance, and reliability of production systems and applications
  • Design, implement, and maintain automated monitoring solutions, using tools such as Datadog
  • Define and monitor service level objectives (SLOs), service level indicators (SLIs), and error budgets
  • Implement effective alerting systems
  • Lead root cause analysis (RCA) and post-mortem investigations
  • Respond to production incidents, diagnose root causes, and implement corrective actions
  • Create and maintain playbooks and documentation for incident response
Job Responsibility
Job Responsibility
  • Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, Dynatrace to automate deployment and management of infrastructure
  • Build and manage CI/CD pipelines to ensure efficient and reliable application deployments
  • Improve infrastructure provisioning and configuration through automation, minimizing manual interventions and reducing human error
  • Monitor the health, performance, and reliability of production systems and applications
  • Design, implement, and maintain automated monitoring solutions, using tools such as Datadog
  • Define and monitor service level objectives (SLOs), service level indicators (SLIs), and error budgets to ensure system reliability and availability meet customer expectations
  • Implement effective alerting systems to identify and address potential issues before they impact users
  • Lead root cause analysis (RCA) and post-mortem investigations after incidents to identify improvements and avoid recurrence
  • Respond to production incidents, diagnose root causes, and implement corrective actions
  • Create and maintain playbooks and documentation for incident response, troubleshooting, and recovery processes
  • Fulltime
!
Read More
Arrow Right