CrawlJobs Logo

Lead Systems Operations Engineer – Platform Reliability Engineering (PRE)

United States, Charlotte Employment contract 119000.00 - 206000.00 USD / Year · Job Posted May 27, 2026
Apply Position
Job Link Share

Job Description

Wells Fargo is seeking a Lead Systems Operations Engineer – Platform Reliability Engineering (PRE) within the CTO Platform organization. This role is aligned to modern Site Reliability Engineering (SRE) practices and is responsible for driving reliability, resiliency, observability, and operational excellence across critical platform and application services. The role is intended for senior engineers with deep expertise in one core platform domain, applying that expertise to proactively improve platform stability, scalability, and availability.

Job Responsibility

  • Act as a Platform Reliability Engineering (PRE) subject matter expert, providing deep technical leadership in one core domain (Database, Cloud, Network, Compute/Storage, Middleware, or Application Support)
  • Lead analysis and resolution of complex, systemic production reliability issues, translating recurring incidents into long-term engineering solutions
  • Apply SRE principles including SLIs, SLOs, error budgets, and incident-driven engineering improvements to both new and legacy platforms
  • Define and drive enterprise observability standards, including metrics, logs, traces, alerting, and service health dashboards
  • Design and implement automation-first solutions to reduce operational toil, improve MTTR, and enable self-healing and self-service
  • Partner with application, infrastructure, cloud, and support teams to improve availability, performance, capacity, and resiliency
  • Lead or contribute to blameless post-mortems, ensuring measurable and sustained reduction of repeat incidents
  • Translate complex technical and operational risks into clear, data-driven guidance for senior leadership
  • Mentor engineers and support staff on reliability engineering, observability, and automation best practices

Requirements

  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience in Systems Operations, SRE, Platform Engineering, or Production Support with deep expertise in at least one platform domain: Database, Cloud, Network, Compute/Storage, Middleware, or Enterprise Application Support

Nice to have

  • Strong hands-on experience applying SRE practices, including SLI/SLO definition, error budgets, and reliability metrics
  • Proven experience troubleshooting and resolving large-scale, distributed production systems
  • Hands-on experience with observability and monitoring tools such as Grafana, Splunk, Prometheus, Cribl, ThousandEyes, AppDynamics, or equivalent, including dashboards, alerting, logs, and metrics
  • Strong scripting and automation skills using Python, Bash, and/or PowerShell to reduce operational toil
  • Experience building automation or reliability tooling using APIs, Git-based workflows, and modern engineering practices
  • Solid understanding of incident, problem, and change management in enterprise production environments
  • Strong communication and influencing skills across engineering teams and senior leadership
  • Experience with capacity management, performance engineering, and resiliency design (HA, fault tolerance, RTO/RPO)
  • Experience operating in hybrid environments (on‑prem + cloud) with complex enterprise dependencies
  • Familiarity with infrastructure automation / IaC tools such as Ansible or Terraform
  • Ability to drive technical debt remediation for critical legacy platforms using structured backlogs
  • Experience mentoring or leading senior engineers in reliability, operations, or SRE-focused roles

What we offer

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Lead Systems Operations Engineer – Platform Reliability Engineering (PRE)

8 matching positions

Combat-Fire Control, Weapons, Platform Systems Engineer

Arcfield, Inc. is seeking Combat Systems/Weapons/Platform Systems Engineer candi...
Location
Location
United States , Middletown, Rhode Island
Salary
Salary:
70090.43 - 121873.66 USD / Year
arcfield.com Logo
Arcfield
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in an Engineering Discipline (Electrical, Mechanical, Systems, etc.) preferred, will consider a BS in STEM fields with sufficient direct Engineering Experience
  • In lieu of a degree, candidates with significant, demonstrated experience operating one of more of DoD and Navy Non-Propulsion Electronic-Mechanical Systems especially, Combat Control, Weapons (Torpedoes, Missiles, Payloads), Launcher, USW Sensor, Electronic Warfare, Imaging, and/or Radar systems (especially Submarine specific equipment) will be strongly considered
  • Experience and knowledge associated with engineering functions related to design, troubleshooting, analysis, reporting, and/or testing of aforementioned complex systems
  • Candidates must be able to work independently, collaboratively, and interface regularly with a wide range of client personnel
  • Proficiency with MS office applications, especially Word, Excel, and PowerPoint, as well as Adobe Acrobat and MATLAB
  • Mentor technical team members and provide direct technical team tasking/scheduling guidance
  • Support scheduling, funding, and resource needs in conjunction with Program Manager
  • Must possess and maintain Secret security clearance
  • Previous Experience in the following elements/technical areas is required: AN/BYG-1, SWFTS architecture, (hardware/software/tactical equipment ) for predecessor, current and future Combat Control Systems
  • Knowledge of Launcher Systems Equipment
Job Responsibility
Job Responsibility
  • Design system modernization and capabilities expansions to meet emerging Navy needs
  • Analysis for and development of Engineering Change Instructions and Engineering Change Proposals
  • Assess installation and system requirements to determine equipment layout
  • Perform and presentation of gap, fault/failure, and performance analyses
  • Support/perform modeling, simulation and prototyping for enhanced and new systems
  • Provide recommendations for process improvements to existing systems and operational processes
  • Troubleshoot, recommend solutions for maintenance and enhancements to Non Propulsion Submarine/Ship systems including Combat Systems and associated Weapons (Torpedoes, Missiles, Other Payloads) in particular, as well as, Launchers, Sonar, Radar, Periscopes, EW, Navigation systems interfaces at NUWCDIVNPT and/or on-site platforms
  • Conduct test and evaluation and quality assessments for system requirements compliance, interoperability, reliability and cyber invulnerability
  • Develop reports and recommendation papers describing results of analyses, testing and problem investigations
  • Coordinate with Team Lead and work with the engineering team to support upgrades to existing weapon systems to meet data requirements
What we offer
What we offer
  • Health Insurance
  • Life Insurance
  • Paid Time Off
  • Holiday Pay
  • Short Term and Long-Term Disability
  • Retirement and Savings
  • Learning and Development opportunities
  • wellness programs as well as other optional benefit elections
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

The Site Reliability Engineer (SRE) for Azure xDPU Storage Team – Hardware Enabl...
Location
Location
United States , Redmond
Salary
Salary:
84200.00 - 165200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Associate's Degree in Computer Science, Information Technology, or related field OR Bachelor's Degree in Computer Science, Information Technology, or related field OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • Bachelor's Degree in Computer Science, Electrical Engineering, Computer Engineering, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Experience operating large-scale, distributed systems in a lab/validation
  • Experience working close to hardware, including networking, storage, or accelerator technologies such as SmartNICs, DPUs, or offload engines
  • Proficiency in one or more programming or scripting languages (C++, C#, Python, Go, or PowerShell)
  • with experience reading lower-level system code
  • Hands-on experience with Microsoft and Azure lab infrastructure and live-site operations
  • Demonstrated understanding of networking, operating systems, and performance characteristics of I/O-intensive distributed systems
Job Responsibility
Job Responsibility
  • Own end-to-end reliability for Azure Storage hardware running in on-prem lab environments
  • Partner with silicon, firmware, BIOS, networking, and OS teams to enable and validate DPU hardware for specific storage use cases
  • Define, measure, and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs) for DPU-accelerated storage scenarios within our lab and pre-prod environments
  • Lead live-site incident response and mitigation for hardware-, firmware-, or DPU-related issues, including deep root-cause analysis across hardware/software boundaries within our lab and pre-prod environments
  • Build automation for provisioning, configuration, validation, canarying, rollback, patching, and recovery of DPU-enabled Azure Storage systems within our lab and pre-prod environments
  • Develop reliability validation strategies, including stress, fault-injection, and chaos testing for DPU hardware enablement and management
  • Create and maintain operational runbooks, diagnostics, telemetry, and health models specific to Fungible DPU platforms within our lab and pre-prod environments
  • Drive improvements in observability and alerting by extending Azure Monitor and internal systems with DPU- and hardware-level signals
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Microsoft has an exciting opportunity for a Senior Site Reliability Engineer (SR...
Location
Location
United States , Reston
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role
  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph
  • Ability to meet Microsoft, customer and/or government security screening requirements are required pre-offer and post-hire for this role
  • This position requires successful verification of the stated security clearance to meet federal government customer requirements
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • This position requires verification of U.S. citizenship due to citizenship-based legal restrictions
Job Responsibility
Job Responsibility
  • Owns reliability architecture and end-to-end service understanding (dependencies, failure modes, and customer journeys) for distributed systems at scale
  • Defines and improves service health via SLIs/SLOs, error budgets, and well-defined operational readiness criteria
  • Drives cross-team reliability reviews and recommends design changes, runbooks, and safe rollout/rollback strategies that improve availability, latency, performance, and efficiency while managing cost
  • Maintains deep, current expertise in cloud reliability practices and the evolving technology landscape
  • Drives adoption of new platform capabilities and operational patterns (e.g., progressive delivery, resilience testing, chaos engineering where appropriate)
  • Mentors engineers through design reviews, incident walkthroughs, and knowledge sharing to raise the reliability bar across related services
  • Implements reliable, scalable, and high-performance changes using SRE practices (progressive delivery, feature flags where applicable, safe rollouts/rollbacks)
  • Owns implementation and rollback plans, validates operational readiness, and reduces toil through automation, self-healing, and standardized playbooks
  • Leverages telemetry and production signals to identify reliability risks and recurring failure patterns, then ships configuration changes, code fixes, or automation to address root causes
  • Expands infrastructure-as-code and operational tooling so teams can manage platforms and services safely and repeatably through code and policy
  • Fulltime
Read More
Arrow Right

Principal Consultant A2 - Infra

Microsoft Industry Solution - Global Center Innovation and Delivery Center (GCID...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related field AND 3+ years leadership experience in relevant area of business. Higher Education Preferred
  • OR master’s degree in computer science, Information Technology, Engineering, or related field AND 6+ years’ experience in technology solutions, practice development, architecture, consulting, and/or Cloud Infrastructure domain
  • Highly proficient & solid Customer facing Project experience involving solution design, project envisioning, planning, development, and deployment of complex solutions with minimum of 10 plus years
  • Must have a proven record of delivering technical solutions
  • 2+ years managing multiple projects or portfolios
  • 1+ year(s) experience leading blended, multidisciplinary teams
  • Preferred Qualifications: Overall minimum 20+ Year of industry experience
  • Technical or Professional Certification in Cloud Infrastructure domain
  • Open to travel domestically and internationally and work with different cultures and customers
  • Technical certifications based on domain/service line (e.g., Azure, Security, Dynamics)
Job Responsibility
Job Responsibility
  • AI-First Delivery Leadership: Embed AI-first principles into delivery workflows, leveraging automation and intelligent orchestration where applicable
  • Lead end-to-end delivery of complex projects, ensuring solutions are scalable, robust, and aligned with client business outcomes
  • Drive engineering excellence through reusable components, accelerators, and scalable architecture
  • Oversee technical execution across multiple projects, ensuring adherence to best practices, quality standards, and compliance requirements
  • Collaborate with clients and internal stakeholders to define strategies, delivery plans, milestones, and risk mitigation approaches
  • Act as a technical point of contact for clients, translating business requirements into scalable technical solutions
  • Ensure delivery models are optimized for modern, AI-native execution, including integration of automation and intelligent processes
  • Ability to step into at risk projects, quickly assess issues, and establish a credible path to recovery or exit
  • Engineering Excellence: Champion high-quality engineering practices across all delivery engagements
  • Ensure adherence to coding standards, architectural integrity, and performance benchmarks
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

Senior Software Engineer - CTJ - Poly
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role
  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph
  • Ability to meet Microsoft, customer and/or government security screening requirements are required pre-offer and post-hire for this role
  • Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination
  • This position requires successful verification of the stated security clearance to meet federal government customer requirements
  • You will be asked to provide clearance verification information prior to an offer of employment
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • This position requires verification of U.S. citizenship due to citizenship-based legal restrictions
  • To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance
Job Responsibility
Job Responsibility
  • Design, build, and operate distributed database services for secure, air‑gapped cloud environments, spanning infrastructure, automation, and platform tooling
  • Own production reliability and operability of database services, including availability, performance, backup and restore, upgrades, and incident response
  • Develop secure, compliant systems that meet strict isolation, regulatory, and supply‑chain requirements in sovereign and disconnected clouds
  • Automate the full service lifecycle end‑to‑end, reducing manual toil across provisioning, deployment, patching, failover, and recovery
  • Troubleshoot and resolve complex distributed system failures, leading deep root‑cause analysis and driving durable, long‑term fixes
  • Collaborate cross‑functionally with engineering, security, networking, and compliance teams to deliver fully operable database platforms
  • Set a high engineering and operational bar, contributing high‑quality code, strong design reviews, and mentoring engineers who build what they run
  • Fulltime
Read More
Arrow Right

Principal Consultant, Cloud & Platform Engineering (AWS Focus)

The Principal Consultant is a senior technical leader who partners closely with ...
Location
Location
United States
Salary
Salary:
180000.00 - 205000.00 USD / Year
turnberrysolutions.com Logo
Turnberry Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10 or more years of experience in cloud architecture, platform engineering, or enterprise systems
  • Strong background delivering consulting or professional services engagements
  • Experience working directly with client leaders and technical decision-makers
  • Comfortable balancing hands-on technical depth with strategic advisory responsibilities
  • Deep experience designing and operating solutions on AWS in production environments
  • Strong understanding of services such as EC2, ECS, EKS, Lambda, VPC, IAM, Route 53, RDS, DynamoDB, and S3
  • Experience designing for security, high availability, disaster recovery, and cost optimization
  • AWS certifications preferred, especially Solutions Architect Professional
  • Working knowledge of Azure and or GCP
  • Ability to compare cloud services, patterns, and tradeoffs across providers
Job Responsibility
Job Responsibility
  • Serve as a senior technical advisor to client stakeholders, including engineering leaders, architects, and executives
  • Lead discussions around cloud strategy, platform design, and modernization approaches
  • Help clients define realistic roadmaps that balance business goals, security, cost, and operational maturity
  • Translate business problems into clear technical solutions that teams can execute
  • Partner with the sales team during pre-sales activities to support opportunity development
  • Participate in client discovery calls, technical deep dives, and solutioning sessions
  • Help define scope, architecture approach, assumptions, and risks during proposal development
  • Review and contribute to statements of work, ensuring technical accuracy and feasibility
  • Act as a trusted technical voice in early client conversations to build credibility and trust
  • Design and review enterprise-grade AWS architectures, including networking, identity, security, compute, and data services
What we offer
What we offer
  • Comprehensive healthcare package (medical, dental, vision)
  • Disability and group term life insurance
  • Health and flexible spending accounts
  • Utilization bonus
  • 401(k) with match
  • Flexible time off for salaried employees
  • Parental leave for salaried employees
  • Flexible work arrangements
  • Fulltime
Read More
Arrow Right

DevOps Engineer

As a DevOps Engineer, you will bring all of the diverse forces together, bridgin...
Location
Location
Philippines , Makati
Salary
Salary:
Not provided
lawadvisor.ventures Logo
LawAdvisor Ventures Ltd.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science field or equivalent
  • Minimum 2 years of experience in cloud platform technologies – AWS
  • Experience with at least one scripting tools (Terraform, UNIX shell, etc.)
  • Experience using an operational ticketing system to record changes and work history details such as Jira
  • Demonstrated ability to support and administer high-volume pre-release and production environments for internal-facing applications
  • Knowledge of disaster recovery concepts to effectively respond to incidents and mitigate disasters
  • Experience supporting advanced storage system software (Replication, Point-in-time Restoration, Tiered Backup strategies)
  • Tangible understanding of the best and worst security practices, concepts, and real-world applications
  • Experience working within an agile framework
  • A roll-up-your-sleeves attitude: be an independent self-starter possessing excellent time management skills and be able to manage multiple implementation activities simultaneously
Job Responsibility
Job Responsibility
  • Be at the core of the company’s technological systems: you will be accountable in the team for technical decision making, defining successful outcomes, and owning engineering execution
  • Be compelled to have a know-how of your product's features and clients’ pain points to be able to devise its infrastructure requirements, which you will translate to drop-in solutions
  • Establish a platform strategy that provides observability, layers of redundancy and limits the blast radius in the event of systems failure
  • Work alongside a team of talented engineers, designers, product managers and quality assurance specialists to direct priorities and ensure successful execution
  • Create and maintain a set of tools and workflows to allow engineers to ship code to production in an efficient and reliable manner
  • Perform cross-functional tasks supporting business projects, infrastructure projects, change management with our various infrastructure platforms (database, operating systems, application teams) as it pertains to implementation, configuration, troubleshooting, and upgrades
  • Lead implementation of compliance and security frameworks (ISO 27001, Cyber Essentials), ensuring platform systems and processes meet regulatory and data protection standards
  • Maintain a high standard of code quality and standards by evaluating, approving, and offering feedback on code submissions from your colleagues
  • Convey complex technical concepts clearly and succinctly because of your impeccable, top-class communications skills
  • Always on a lookout for new technologies: identify new and emerging technologies for adoption, drive consistent code reviews, propose changes where needed
What we offer
What we offer
  • A highly-skilled, driven and dedicated team
  • Competitive salary: We strive to always provide industry market rates
  • Continuous learning and development: whether by way of conferences, online courses, or further study, we’re here to support your personal and professional growth
  • Company retreats
  • A direct line with our key users and influential high-level stakeholders (investors, advisors, and other relevant members) to use as and when needed
Read More
Arrow Right

Principal Consultant - Apps

Microsoft Industry Solutions - Global Center Innovation and Delivery Center (GCI...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 20+ years of experience in software/solution engineering, with at least 10–15 years as Architect and delivery leadership roles
  • Proven experience in leading delivery of complex, multi-disciplinary projects
  • Strong understanding of modern delivery methodologies (Agile, Scrum, DevOps, etc.)
  • Excellent communication, stakeholder management, problem-solving, and team leadership skills
  • Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience)
  • Relevant certifications are a plus
  • End‑to‑end design and development of modern web and mobile applications using React, Angular, Next.js, Blazor, or equivalent frameworks
  • Strong backend engineering expertise using .NET, Java, Node.js, or Python, applying clean architecture, domain‑driven design, and API‑first principles
  • Experience building scalable microservices and distributed systems, leveraging REST, gRPC, event‑driven architectures, and asynchronous processing
  • Hands‑on data and platform fundamentals, including relational and NoSQL databases (Azure SQL, PostgreSQL, Cosmos DB), performance tuning, scalability, resiliency, and application security
Job Responsibility
Job Responsibility
  • Embed AI-first principles into delivery workflows, leveraging automation and intelligent orchestration where applicable
  • Own the Architecture and drive end-to-end delivery of complex projects, ensuring solutions are scalable, robust, and aligned with client business outcomes
  • Drive engineering excellence through reusable components, accelerators, and scalable architecture
  • Oversee technical execution across multiple projects, ensuring adherence to best practices, quality standards, and compliance requirements
  • Collaborate with clients and internal stakeholders to define strategies, delivery plans, milestones, and risk mitigation approaches
  • Act as a technical point of contact for clients, translating business requirements into scalable technical solutions
  • Ensure delivery models are optimized for modern, AI-native execution, including integration of automation and intelligent processes
  • Ability to step into at‑risk projects, quickly assess issues, and establish a credible path to recovery or exit
  • Champion high-quality engineering practices across all delivery engagements
  • Ensure adherence to coding standards, architectural integrity, and performance benchmarks
Read More
Arrow Right