CrawlJobs Logo

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

United States, New York, New York 179400.00 - 245600.00 USD / Year · Job Posted March 25, 2026
Apply Position
Job Link Share

Job Description

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering). Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers, doers and disruptors, who love to solve real problems and meet real customer needs. We are seeking experienced DevOps Engineers who are passionate about platform engineering to join our team. As an Engineer on our team, you’ll have the opportunity to be on the forefront of driving a major transformation within Capital One. Our mission is to build a platform that developers love to use, even as we scale to support a massive multi-tenant environment. You will play a critical role in reducing cognitive load for the wider organization by codifying operational knowledge into custom Kubernetes Operators and automation tools written in Go. By joining this team, you are effectively building the 'internal product' that powers the company, turning the friction of navigating a large fleet of clusters into a streamlined, automated experience that accelerates innovation.

Job Responsibility

  • Lead a portfolio of diverse technology projects with deep experience in platform engineering, machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of customers achieve financial empowerment
  • Utilize programming languages like Python, and Golang, along with container orchestration tools including Docker and Kubernetes, configuration management tools including Ansible and Terraform, and a variety of AWS tools and services

Requirements

  • Bachelor’s degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience

Nice to have

  • 7+ years of DevOps and Platform Engineering experience
  • 4+ years of experience with coding and scripting (Python, SQL, Java, JavaScript, Golang, Bash, Perl or Ruby)
  • 4+ years of experience with cloud orchestration tooling and related technologies Kubernetes, Helm, ArgoCD, Crossplane, AWS ACK
  • 4+ years of experience using build and deployment tools (Jenkins, Docker)
  • 2+ years of experience with deploying clustered web services
  • 2+ years of experience with building custom Kubernetes operators and controllers using frameworks like Kubebuilder
  • 2+ years of experience with Kubernetes CNI and Mesh tooling like Istio, Cilium or Linkerd
  • 2+years of experience in Agile practices

What we offer

  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

8 matching positions

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering). Do you...
Location
Location
United States , McLean; Plano; Richmond
Salary
Salary:
179400.00 - 225100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
  • Utilize programming languages like Java, Python, SQL, Ruby and Go, Container Orchestration services including Docker and Kubernetes, CM tools including Ansible and Terraform, and a variety of AWS tools and services
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Senior Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Senior Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)....
Location
Location
United States , McLean; Richmond
Salary
Salary:
209000.00 - 262400.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • At least 6 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 6 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Work within and across Agile teams to design, develop, test, implement, and support technical solutions across full-stack development tools and technologies
  • Lead the craftsmanship, availability, resilience, and scalability of your solutions
  • Bring a passion to stay on top of tech trends, experiment with and learn new technologies, participate in internal & external technology communities, and mentor other members of the engineering community
  • Encourage innovation, implementation of cutting-edge technologies, inclusion, outside-of-the-box thinking, teamwork, self-organization, and diversity
  • Work across boundaries to improve the velocity of your and other teams
  • Lead efforts to enable and simplify the use of new and existing AWS services
  • Work with product managers to understand desired application and platform capabilities and testing scenarios
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Lead Software Engineer, DevOps (Azure)(Cloud Operations Resilience Engineering)

Lead Software Engineer, DevOps ( Azure)(Cloud Operations Resilience Engineering)...
Location
Location
United States , McLean;Plano;Richmond
Salary
Salary:
179400.00 - 225100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
  • Utilize programming languages like Python and Go, Container Orchestration services including Docker and Kubernetes, CM tools including Terraform, and a variety of AWS and Azure tools and services
What we offer
What we offer
  • Performance based incentive compensation
  • Health, financial and other benefits
  • Fulltime
Read More
Arrow Right

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...
Location
Location
United States , Santa Clara
Salary
Salary:
126000.00 - 203500.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
  • Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
  • Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
  • Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
  • Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
  • Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
  • Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
  • Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
  • Strong problem-solving skills and ability to work across teams
Job Responsibility
Job Responsibility
  • Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
  • Lead improvements across production systems, including performance, availability, and incident response
  • Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
  • Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
  • Partner with development teams to improve system reliability, observability, and cloud-native design patterns
  • Define and implement monitoring, alerting, and observability strategies across distributed systems
  • Lead incident response efforts, including root cause analysis and long-term remediation strategies
  • Identify and eliminate operational toil through automation and system improvements
  • Mentor engineers and contribute to raising the bar for production engineering practices
What we offer
What we offer
  • restricted stock units
  • bonus
  • Fulltime
Read More
Arrow Right

Digital Software Engineering Lead Analyst – Vice President

The Digital S/W Engineer Lead Analyst is a lead-level professional role. This in...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of progressive software development experience, demonstrating expert-level proficiency in JavaScript and Java frameworks (e.g., React.js, Spring Boot), and databases (e.g., Oracle, MongoDB, PostgreSQL)
  • Expert in Modern Application Architecture: Mastery of modern application architecture principles, including microservices, event-driven architectures, serverless, and cloud-native patterns
  • Deep expertise in Data Structures, Algorithms, and Object-Oriented Design Principles with Java
  • Proven leadership in leveraging and integrating Artificial Intelligence (AI) and Machine Learning (ML) tools to optimize development workflows, enhance code quality, and drive intelligent features
  • Extensive experience with Microservices frameworks (e.g., Spring Boot, Quarkus), Event-Driven Services (e.g., Kafka, RabbitMQ), and advanced Cloud-Native Application Development (AWS, Azure, GCP)
  • Multiple years of experience leading the design and implementation of Service-Oriented and Microservices architectures, including advanced REST, GraphQL, and gRPC implementations
  • Full Stack Architecture & Leadership: Demonstrated ability to architect, design, develop, and maintain complex, enterprise-grade full-stack solutions, encompassing both front-end and back-end components of robust web applications, with an emphasis on scalability and performance
  • Front-End Expertise: Expert-level proficiency in designing and developing highly intuitive, performant, and accessible user interfaces using cutting-edge JavaScript frameworks (e.g., React, Angular, Vue), advanced HTML5, and CSS (e.g., SASS/LESS, CSS-in-JS)
  • Back-End Mastery: Extensive experience in architecting and developing scalable server-side logic and sophisticated APIs using languages such as Java, Python, or similar, with a focus on high-throughput and low-latency systems
  • Advanced Database & Data Architecture Expertise: Comprehensive knowledge of SQL and PL/SQL, with a deep understanding of Relational Database Management Systems (RDBMS), particularly Oracle, including advanced database design, performance tuning, data warehousing, and NoSQL databases
Job Responsibility
Job Responsibility
  • Strategic Technical Leadership: Provide expert guidance and strategic oversight across the entire software development lifecycle, partnering continuously with senior stakeholders to align technical solutions with business objectives
  • Architectural Stewardship: Lead the design and evolution of robust, scalable, and secure enterprise applications, defining architectural patterns and ensuring adherence to best practices in cutting-edge technologies and software design patterns
  • Team & Project Leadership: Drive complex engineering initiatives within Agile delivery teams, fostering a culture of collaboration, excellence, and continuous improvement. Lead sprint goal achievement, oversee code quality, and actively participate in and lead broader Citi technical communities and advanced Agile/Scrum processes
  • Mentorship & Coaching: Act as a technical mentor and coach for junior and intermediate engineers, fostering their growth, critical thinking, and advanced problem-solving capabilities
  • Advanced Problem Solving & Troubleshooting: Exhibit mastery in analyzing and resolving intricate coding, application performance, and design challenges. Lead cross-functional efforts to diagnose and troubleshoot complex system issues
  • Proactive Root Cause Analysis: Spearhead thorough investigations to identify systemic root causes of development and performance bottlenecks, leading the implementation of comprehensive, long-term defect resolutions and preventative measures
  • Technical Vision & Acumen: Demonstrate a profound and forward-looking understanding of technical requirements, emerging trends, and their strategic implications for solutions under development, ensuring future-proof designs
  • Containerization, Orchestration & Cloud Strategy: Drive the strategic adoption and optimization of Docker for application containerization, Kubernetes for efficient service orchestration, and other cloud-native technologies to build resilient and scalable infrastructure
  • Communication, Risk & Stakeholder Management: Master effective communication of progress, proactively anticipate and mitigate technical and project bottlenecks, provide expert escalation management, and adeptly identify, assess, track, and manage issues and risks at strategic and operational levels
  • Process and System Optimization: Champion and lead initiatives to streamline, automate, and eliminate redundant processes within architecture, build, delivery, production operations, and across various business areas, driving significant efficiency gains and innovation
  • Fulltime
Read More
Arrow Right

Lead Software Engineer

Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Experience in Software Engineering, SRE, DevOps, or Platform Engineering
  • Strong proficiency in Python for automation and tooling
  • Hands‑on experience with Grafana, Prometheus, and Splunk in production environments
  • Solid understanding of SLIs, SLOs, dashboards, alerting, and observability best practices
  • Experience applying AI/ML concepts to monitoring, alerting, or operational analytics
  • Strong knowledge of Linux, networking, and distributed systems
  • Experience with Cloud platforms and Kubernetes/OpenShift
  • Proven experience leading incidents, RCAs, and reliability initiatives
  • Experience building custom Prometheus exporters or advanced Grafana dashboards
Job Responsibility
Job Responsibility
  • Lead complex technology initiatives including those that are companywide with broad impact
  • Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines
  • Design, code, test, debug, and document for projects and programs
  • Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
  • Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
  • Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
  • Lead projects, teams, or serve as a peer mentor
  • Own and improve availability, performance, scalability, and resilience of production systems
  • Define, monitor, and manage SLIs/SLOs and error budgets to guide reliability investments
  • Lead capacity planning, performance testing, failover readiness, and disaster‑recovery design
  • Fulltime
Read More
Arrow Right

Lead Systems Software Engineer

Lead Systems Software Engineer role at Cloud Software Group (Citrix), focusing o...
Location
Location
United States , Ft. Lauderdale, Florida
Salary
Salary:
Not provided
cloud.com Logo
Cloud Software Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent in Computer Science or related field
  • Minimum 8 years of professional experience in software engineering using C++/C#
  • Experience with C also required
  • Solid foundation in systems programming, including algorithms, data structures, operating systems, and networking
  • Proven experience with memory management, multithreading, performance tuning, and cross-platform development
Job Responsibility
Job Responsibility
  • Own the engineering lifecycle for LTSR branches, delivering high-quality fixes, performance optimizations, and security updates tailored to enterprise stability requirements
  • Identify and implement targeted feature enhancements or backports that improve customer workflows, product integration, or overall usability within LTSR guidelines
  • Collaborate with Product Managers, Support, DevOps, and Security teams to prioritize and deliver customer-focused solutions that align with long-term product goals
  • Lead root cause investigations for complex customer-reported issues and implement long-term fixes that enhance product resilience
  • Promote and uphold high engineering standards through rigorous code reviews, testing, logging/instrumentation, and static/dynamic code analysis
  • Participate in Agile development practices, helping drive continuous improvement in both product and process
  • Maintain accurate and up-to-date technical documentation, including design specs, release notes, and engineering workflows
Read More
Arrow Right

Lead Full-Stack Software Engineer

Location
Location
Poland; Ukraine
Salary
Salary:
Not provided
Intellias
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with modern backend development using .NET (C#)
  • Solid experience with frontend development using React or ability to effectively collaborate with frontend teams working in this stack
  • Strong experience designing enterprise APIs and integration contracts, including REST APIs, OpenAPI specifications, validation, versioning, and integration with external enterprise systems
  • Experience with relational data modelling, SQL performance considerations, and reporting/read-model design for operational and analytical consumers
  • Proven experience building and operating applications in Microsoft Azure, including: Azure Container Apps (or equivalent container platforms), Azure SQL Database, Azure networking (VNet, Private Endpoints)
  • Practical experience with containerised environments (Docker) and cloud-native deployment models
  • Strong understanding of cloud-native architecture principles, including: Stateless services, Scalability and resiliency patterns, Secure service-to-service communication
  • Experience working with CI/CD pipelines, preferably: Azure DevOps Pipelines, Git-based workflows
  • Experience implementing infrastructure as code, ideally using Bicep or similar tools
  • Good understanding of application security practices, including: Secure secret management (e.g., Azure Key Vault), Identity and access management (Entra ID / OIDC)
Job Responsibility
Job Responsibility
  • Lead design and development of scalable, cloud-native platform
  • Define and implement architecture-aligned solutions in collaboration with Solution Architects and client stakeholders
  • Drive engineering best practices, including code quality, testing, CI/CD, and security
  • Contribute to technical decision-making and architecture discussions (ADR definition and alignment)
  • Mentor and guide engineers, supporting team growth and delivery excellence
  • Ensure alignment with AI-first, spec-driven development approach and engineering workflows
  • Collaborate closely with distributed teams, including DevOps, QA, Product Manager/Business Analyst, and client stakeholders
  • Support integration development with enterprise systems (e.g., SAP, ServiceNow, Data Lake)
  • Participate in end-to-end delivery, from discovery and design to implementation and release
  • Ensure adherence to agreed SDLC processes and release workflows
  • Fulltime
Read More
Arrow Right