CrawlJobs Logo

Lead Software Engineer - SRE

https://www.wellsfargo.com/ Logo

Wells Fargo

Location Icon

Location:
India , Bengaluru

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Wells Fargo is seeking a Lead Software Engineer – SRE

Job Responsibility:

  • Lead complex technology initiatives including those that are companywide with broad impact
  • Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines
  • Design, code, test, debug, and document for projects and programs
  • Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
  • Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
  • Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
  • Lead projects, teams, or serve as a peer mentor
  • Develop high-quality, scalable, and maintainable Python code for backend services, APIs, and applications
  • Architect and implement robust solutions that meet business requirements and Non-functional requirements
  • Participate in code reviews and provide constructive feedback to ensure code quality and code hygiene
  • Write unit and integration tests to ensure code reliability and stability
  • Lead technical discussions and contribute to architectural decisions
  • Mentor and guide junior engineers, fostering their professional growth
  • Champion best practices in software development, including coding standards, testing, and deployment
  • Evaluate and recommend new technologies and tools to improve development efficiency
  • Collaborate effectively with product owner, designers, and other engineers to deliver high-quality software
  • Communicate technical concepts clearly and concisely to both technical and non-technical stakeholders
  • Participate in agile development processes, including sprint planning, daily stand-ups, and retrospectives
  • Identify and resolve performance bottlenecks and optimize code for efficiency
  • Monitor and troubleshoot production systems to ensure uptime and reliability
  • Implement and maintain CI/CD pipelines for automated testing and deployment
  • Design and implement efficient database schemas and queries (SQL or NoSQL)
  • Build and consume RESTful or GraphQL APIs
  • Work with message queues and other asynchronous processing systems

Requirements:

  • 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field
  • 4+ years of professional experience in Python software development
  • Strong proficiency in Python and related frameworks (e.g., Django, Flask, FastAPI)
  • Extensive experience with relational and/or NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB)
  • Solid understanding of software design principles and patterns
  • Experience with version control systems (e.g., Git)
  • Experience with cloud platforms (e.g., AWS, Azure, GCP)
  • Experience with containerization and orchestration (e.g. Docker, Kubernetes, Openshift)
  • Strong knowledge of RESTful API design
  • Excellent problem-solving and debugging skills
  • Strong communication and collaboration skills
  • Good understanding and hands on experience on UI development framework (React / Angular)

Nice to have:

  • Experience with asynchronous programming (e.g., asyncio)
  • Experience with message queues (e.g., RabbitMQ, Kafka)
  • Experience with CI/CD pipelines (e.g., Jenkins, GitLab CI, Git Action)
  • Experience with GraphQL
  • Experience using Infrastructure as Code tools (e.g. Ansible, Terraform, CloudFormation)
  • Good understanding about SRE practices (Reliability, Observability, Availability, TOIL Automation)
  • Experience working with Observability tools (Splunk, Grafana, Prometheus, ELK)
  • Knowledge and good understanding on Reliability concepts (SLI, SLO, SLI, Error budget)

Additional Information:

Job Posted:
April 01, 2026

Expiration:
April 12, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead Software Engineer - SRE

Intermediate Software Engineer SRE – AI

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Support Lead

Site Reliability Engineering Support Lead role focused on application support, d...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid SRE process experience
  • 5+ years of Leading high-performance, 24x7, DevOps or SysOps team
  • Proficiency in Windows administration, Office 365, Exchange, SharePoint, Active Directory, Backup, Networking and Infrastructure
  • Experience with Microsoft OS Windows & Server
  • Experience in ticket tracking and resolving on time
  • Hands-on experience on ticketing tools (ServiceNow)
  • Excellent verbal, written, presentation and interpersonal communication skills
  • Ability to make complex technical matters easy-to-comprehend for non-technical persons.
Job Responsibility
Job Responsibility
  • Taking end-to-end Ownership of Application Support for Production Systems Issues resolution
  • Implementing, monitoring, and maintaining CI/CD frameworks
  • Developing new capabilities, coordinating implementation across a large number of teams including infrastructure, developer tools and information security
  • Influencing a culture of Site Reliability Engineering. Engaging in training and mentoring to help develop other engineers with SRE mind set
  • Providing the first line of after-deployment technical support at L1 and L2 level for applications and and/or associated production systems diagnostics, and network health monitoring
  • Coordination and/or for deploying hands-on fixes, patches and software updates at the application level, and as appropriate at the network level
  • Managing a team of technical support engineers who provide technical support to users
  • Escalating complex problems to the L3 level of expertise within organization, along with observations from investigative and diagnostic assessments
  • Co-ordinating in the investigation of repeated technical issues affecting user system and seeing through to resolution
  • Escalating, resolving, guiding team, and tracking production incidents to closure
What we offer
What we offer
  • Competitive base salary (which is annually reviewed)
  • Hybrid working model (up to 2 days working at home per week)
  • Additional benefits to support you and your family to be well, live well and save well.
  • Fulltime
Read More
Arrow Right

Software Engineer, Site Reliability

As a Site Reliability Engineer (SRE) at Fireworks AI, you will play a critical r...
Location
Location
United States , San Mateo
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale production systems
  • Deep expertise in SRE principles and practices, including SLOs, SLIs, operational automation, incident management, and post-mortems
  • Extensive hands-on experience with public cloud platforms (AWS, GCP, Azure), including compute, networking, storage, and database services
  • Strong experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
  • Proficiency in designing and implementing robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK stack, and distributed tracing
  • Solid programming/scripting skills in at least one language (e.g., Python, Go) for automation and tool development
  • In-depth knowledge of Linux operating systems, networking fundamentals, and system debugging
  • Proven ability to troubleshoot complex issues across the entire stack
  • Excellent communication, collaboration, and problem-solving skills
Job Responsibility
Job Responsibility
  • Ensuring System Reliability: Ensure systems are designed and implemented with high availability, scalability, and performance. Focus on fault tolerance, disaster recovery, identifying and removing scaling bottlenecks, and performance optimization across our multi-cloud infrastructure
  • Incident Management & Response: Lead efforts in incident detection, response, and resolution for critical production issues. Drive post-mortems to identify root causes and implement preventative measures to improve system reliability
  • Observability & Monitoring: Develop, implement, and maintain comprehensive monitoring, alerting, logging, and tracing solutions to provide deep insights into system health and performance
  • Automation & Toil Reduction: Identify and automate repetitive operational tasks to reduce toil and improve operational efficiency. Develop tools and scripts to streamline deployments, scaling, and system management
  • Capacity Planning & Performance Tuning: Work proactively on capacity planning to ensure our infrastructure can gracefully handle growth and peak loads. Optimize system performance and resource utilization
  • Reliability Best Practices: Collaborate with software engineers to embed reliability principles (e.g., SLOs, SLIs, error budgets) into the development lifecycle, promoting a culture of operational excellence
  • On-call Rotation: Participate in a periodic on-call rotation to support our production environment and respond to critical alerts
  • Fulltime
Read More
Arrow Right

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Engineering Lead Analyst

Engineering Lead Analyst position in Citi's Cloud Technology Services (CTS) team...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12 Plus years of relevant experience in an Engineering role
  • Deep understanding of public cloud services adoption at scale
  • Expert-level understanding of AWS/GCP Cloud Network across Internet Application Hosting, B2B Connectivity, and Application Resiliency
  • Infrastructure as Code (IaC) Hands On Expertise with Python and Go
  • CI/CD experience with Terraform, Harness, Tekton, Jenkins, etc.
  • Testing Automation experience with Terratest, Cucumber, PytestBD, AWS Fault Injection Simulator (FIS), Chaos Mesh, etc.
  • Familiarity with Agile Development, DevOps, and SRE practices
  • Demonstrated ability to quickly learn new technologies and adapt to changing project requirements
  • Experience evaluating complex requirements and rationalizing them into consistent service offering
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Technical Expertise: hands-on technical contribution within product team focused on public cloud network
  • Collaborative Development: contribute to team of cloud engineers and full-stack software developers
  • Automation: Identify and develop automation initiatives to improve processes related to public cloud services consumption
  • Cross-Functional Partnership: collaborate with teams across Citi's technology landscape
  • Engineering Excellence: contribute to defining and measuring success criteria for service availability and reliability
  • Compliance Advocacy: ensure adherence to relevant standards, policies, and regulations
  • Serve as technology subject matter expert for internal and external stakeholders
  • Provide direction for firm mandated controls and compliance initiatives
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
What we offer
What we offer
  • Career growth opportunities
  • Opportunity to give back to community
  • Make real impact
  • Global team environment
  • Well-being support
  • Work-life balance programs
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Principal Network Operations Site Reliability Systems Engineer

This role entails incorporating Site Reliability Engineering (SRE) concepts into...
Location
Location
United States
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or master’s degree in computer science, Computer Engineering, Information Systems, or equivalent
  • Typically, 10+ years’ experience
  • Experience with cloud platforms
  • Experience with software development languages for console and web-based applications
  • Experience in User Interface (UI/UX) design
  • Understanding of and experience with common network infrastructure devices such as switches, routers, access points, authentication, authorization, and accounting systems and protocols, and network management utilities
  • Experience with network monitoring protocols
  • Ability to design and implement relational database solutions, time-series databases, and NoSQL database solutions
  • Excellent analytical and problem-solving skills
  • Experience in the overall architecture of software systems for products and solutions
Job Responsibility
Job Responsibility
  • Develop strategies and implement plans to incorporate SRE concepts into network, tool, and process designs and leads execution of those strategies and plans
  • Evaluates LAN, WLAN, SD-WAN, AAA, Private 5G, and other network designs for fit-for-use criteria, and designs prototype analysis tools to facilitate rapid iteration of network delivery service enhancements
  • Identifies and engineers new ways to leverage data from multiple platforms to identify network performance trends and detect anomalies
  • Prototypes machine learning anomaly detection, event signature identification, and trend identification
  • Automates common incident management and problem management procedures
  • Develops organization-wide architectures, methodologies, and prototypes for software systems design and development across multiple platforms and organizations within the Global Business Unit
  • Identifies and evaluates new technologies and innovations for alignment with technology roadmap and business value
  • creates plans for prototyping and prototype iteration
  • Reviews and evaluates designs and project activities for compliance with development guidelines and standards
  • provides tangible feedback to improve product quality and mitigate failure risk
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Career development programs
  • Inclusive environment celebrating individual uniqueness
  • Fulltime
Read More
Arrow Right