CrawlJobs Logo

Site Reliability Operations III

United States of America, Bentonville Employment contract 80000.00 - 155000.00 USD / Year · Job Posted January 07, 2026
Apply Position
Job Link Share

Job Description

The Command & Control Center is the nerve center for Walmart Global Technology. On the Logistics Support team, we proactively monitor critical supply chain applications and infrastructure, providing early warnings and rapid response to potential disruptions. Our team ensures seamless operations by swiftly mitigating incidents and leveraging advanced automation and AI-driven monitoring to keep Walmart’s supply chain resilient and efficient.

Job Responsibility

  • Monitor and alert on software or system performance, determining thresholds for monitoring metrics and triggers alerts based on thresholds
  • Supervise specific procedures to proactively check the health of applications and infrastructure, including a variety of operating systems, hardware, and software
  • Investigate and diagnose incidents to restore a failed IT service as quickly as possible and within specified SLAs
  • Document troubleshooting steps and service restoration details for knowledge management
  • Liaison between Tech and external support to resolve escalated incidents and ensure timely closure
  • Record and classify received incidents and undertake immediate corrective action for moderate complexity queries under moderate supervision
  • Research and recommend alternative actions for incident resolution
  • Contribute to command-and-control related activities focused on restoration of complex outages
  • Conduct complex maintenance procedures for applications independently
  • Monitor and evaluate the performance of the application by tracking and analyzing appropriate metrics
  • Perform maintenance (corrective, adaptive, perfective) and re-engineering activities
  • Analyze application logs, maintenance activity data, performance data, and provide analysis
  • Evaluate change requests to identify those which are valid and feasible
  • Troubleshoot performance and availability bottlenecks for assigned application independently
  • Triage to detect and determine symptom versus cause of defects
  • Actively provide data for and participate in RCA
  • Build, maintain, and enhance effective internal and external partnerships
  • Influence technical outcomes and assist in communicating shared goals with diverse groups and parties
  • Identify and address additional partner technical needs and educate them on value creation
  • Communicate with other individuals or teams to solve shared business problems cooperatively
  • Bring ideas and technical solutions proactively to business partners and stakeholders

Requirements

  • Strong communication and interpersonal skills
  • Experience with Jira, Looper, and Kubernetes
  • Familiarity with Grafana and ability to write queries (PromQL)
  • GitHub experience
  • Database knowledge is preferable but not required
  • Ability to work independently and make decisions with guidance
  • Comprehension of changes to methodologies and resources, and ability to articulate the same
  • Experience with cloud applications and ability to pull logs
  • Strong analytical and problem-solving skills
  • Ability to work collaboratively with cross-functional teams
  • Experience with incident management and troubleshooting
  • Strong technical skills, including proficiency in monitoring and alerting, incident management, and DevOps orientation
  • Immigration sponsorship is not available for this role

Nice to have

  • Experience in site reliability operations, site and system administration, infrastructure management, or related area
  • Master's degree in site reliability operations, site and system administration, infrastructure management, or related area.
  • SRE certification (for example, IBM Cloud Site Reliability Engineer).
  • We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.

What we offer

  • Multiple health plan options, including vision & dental plans for you & dependents
  • Financial benefits including 401(k), stock purchase plans, life insurance and more
  • Associate discounts in-store and online
  • Education assistance for Associate and dependents
  • Parental Leave
  • Pay during military service
  • Paid Time off - to include vacation, sick, parental
  • Short-term and long-term disability for when you can't work because of injury, illness, or childbirth
  • incentive awards for your performance
  • maternity and parental leave, PTO, health benefits
  • performance-based bonus awards
  • company discounts
  • adoption and surrogacy expense reimbursement

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Operations III

8 matching positions

Site Reliability Engineer III

We're looking for a senior Site Reliability Engineer to join our small, high-own...
Location
Location
United States
Salary
Salary:
148320.00 - 185400.00 USD / Year
absencesoft.com Logo
AbsenceSoft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, DevOps, or a related engineering role
  • Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
  • Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
  • Experience building and operating CI/CD pipelines using Jenkins and GitHub
  • Proficiency in Python, Go, or Bash for automation
  • Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
  • Demonstrated experience leading incident response in complex, distributed systems
  • Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
  • Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
  • A collaborative, ownership-driven mindset with strong communication skills
Job Responsibility
Job Responsibility
  • Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
  • Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
  • Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
  • Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
  • Define and maintain SLOs, SLIs, and error budgets
  • Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
  • Lead blameless postmortems
  • Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
  • Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
  • Mentor junior SREs through code reviews, incident pairing, and documentation
What we offer
What we offer
  • Impact that matters
  • Flexibility and trust
  • Remote-first and results driven
  • Growth and development
  • Access to learning resources, leadership programs, and real opportunities to take on new challenges
  • Competitive rewards
  • Comprehensive benefits
  • Performance-based bonus program
  • Equity opportunities
  • Time for life
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer III

The Site Reliability Engineer is responsible for designing, developing, and main...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate degree OR 6 to 10 years of Computer Science, IT or related field experience OR
  • Master’s degree and 7 to 10 years of Computer Science, IT or related field experience OR
  • Bachelor’s degree and 8 to 12 years of Computer Science, IT or related field experience
  • Working experience with various cloud services on AWS (Azure, GCP) and containerization technologies (Docker, Kubernetes)
  • Strong programing skills in languages such as Python
  • Working experience of infrastructure as code (IaC) tools (Terraform, CloudFormation)
  • Working experience with monitoring and alerting tools (Prometheus, Grafana, etc.)
  • Working experience with DevOps/MLOps practice and CI/CD pipelines
  • Proficiency in automated testing tools and frameworks (e.g., Selenium, JUnit, pytest), Incident Management, Production Issue Root Cause Analysis and Improve System Quality
Job Responsibility
Job Responsibility
  • Design and implement systems and processes to improve the reliability, scalability, and performance of applications
  • Automate routine operational tasks, such as deployments, monitoring, and incident response, to improve efficiency and reduce human error
  • Develop and maintain monitoring tools and dashboards to track system health, performance, and availability
  • Respond to and resolve incidents promptly, conducting root cause analysis and implementing preventive measures
  • Provide ongoing maintenance and support for existing systems, ensuring that they are secure, efficient, and reliable
  • Work on integrating various software applications and platforms to ensure seamless operation across the organization
  • Implement and maintain security measures to protect systems from unauthorized access and other threats
What we offer
What we offer
  • Competitive and comprehensive Total Rewards Plans that are aligned with local industry standards
Read More
Arrow Right

Site Reliability Engineer III

Under limited supervision, the Site Reliability Engineer III is responsible for ...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
allianceautomotive.co.uk Logo
Alliance Automotive UK LV Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Typically requires a bachelor's degree and five (5) or more years of related experience or an equivalent combination
  • Understanding of Kubernetes, containers, clusters, and elastic scalability
  • Expertise in SRE principles
  • Mindset of continually finding ways to drive scalability, stability, and performance
  • Cloud Services experience with Google Cloud Platform (GCP)
  • Experience with API, service-based or microservice-based architecture
  • Proficiency in infrastructure, network, database, operating systems, or security troubleshooting and remediation
  • Architecture-level knowledge of Windows and Linux and Infrastructure systems
  • Experience with production deployment, monitoring, and operational support for enterprise-class applications (Dynatrace a plus)
  • Experience working with Continuous Integration/ Continuous Deployment tools
Job Responsibility
Job Responsibility
  • Gathers and analyzes metrics from monitoring platforms to assist in performance tuning and fault tolerance
  • Partners with development teams to improve services through testing and release procedures
  • Participates in system design, platform management and capacity planning
  • Balances feature development speed and reliability with service-level objectives
  • Works closely with the incident response team and restoring service to normal operation
  • Understands debugging and applying troubleshooting skills
  • Investigates, blocks and rate-limits unwanted traffic
  • Utilizes monitoring systems and dashboards for proactive changes and alerting
  • Establishes continuous process improvement cycles where the process, performance, and supporting technologies are reviewed and enhanced where applicable
  • Performs other duties as assigned
What we offer
What we offer
  • options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer III

Under limited supervision, the Site Reliability Engineer III is responsible for ...
Location
Location
United States , Birmingham, Alabama
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Typically requires a bachelor's degree and five (5) or more years of related experience or an equivalent combination
  • Understanding of Kubernetes, containers, clusters, and elastic scalability
  • Expertise in SRE principles
  • Mindset of continually finding ways to drive scalability, stability, and performance
  • Cloud Services experience with Google Cloud Platform (GCP)
  • Experience with API, service-based or microservice-based architecture
  • Proficiency in infrastructure, network, database, operating systems, or security troubleshooting and remediation
  • Architecture-level knowledge of Windows and Linux and Infrastructure systems
  • Experience with production deployment, monitoring, and operational support for enterprise-class applications (Dynatrace a plus)
  • Experience working with Continuous Integration/ Continuous Deployment tools
Job Responsibility
Job Responsibility
  • Gathers and analyzes metrics from monitoring platforms to assist in performance tuning and fault tolerance
  • Partners with development teams to improve services through testing and release procedures
  • Participates in system design, platform management and capacity planning
  • Balances feature development speed and reliability with service-level objectives
  • Works closely with the incident response team and restoring service to normal operation
  • Understands debugging and applying troubleshooting skills
  • Investigates, blocks and rate-limits unwanted traffic
  • Utilizes monitoring systems and dashboards for proactive changes and alerting
  • Establishes continuous process improvement cycles where the process, performance, and supporting technologies are reviewed and enhanced where applicable
  • Performs other duties as assigned.
What we offer
What we offer
  • Options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay.
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer III

Zuora’s Cloud Engineering teams are responsible for Cloud infrastructures, monit...
Location
Location
India , Chennai
Salary
Salary:
Not provided
zuora.com Logo
Zuora
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-8 years of relevant experience on SRE/DevOps
  • Proven hands-on working experience with core AWS services (e.g., EC2, VPC, S3, RDS, IAM, CloudWatch, EKS/ECS)
  • Deep expertise in infrastructure-as-code principles using Terraform for provisioning and state management
  • Expert-level knowledge and practical experience with configuration management tools such as Puppet and/or Ansible
  • Strong experience setting up, maintaining, and enhancing Continuous Integration/Continuous Deployment pipelines using Jenkins
  • Proficiency in scripting languages, particularly Python and/or Shell scripting, for developing automation tools and performing system administration tasks
  • Advanced knowledge of Linux operating systems, including performance tuning, troubleshooting, security, and networking fundamentals
  • Working knowledge and operational experience with distributed messaging queues, specifically Kafka
Job Responsibility
Job Responsibility
  • Maintain and improve the reliability, scalability, and performance of our production systems, targeting a high-availability environment
  • Design, implement, and maintain automation solutions for infrastructure provisioning, deployment, configuration management, and monitoring using Terraform and Jenkins
  • Administer, manage, and optimize our cloud infrastructure primarily hosted on AWS, focusing on cost efficiency and secure operations
  • Develop and maintain infrastructure-as-code using Puppet and/or Ansible to ensure consistent and reproducible environments
  • Participate in on-call rotation, troubleshoot and resolve critical production incidents, and conduct comprehensive post-mortems to prevent recurrence
  • Apply strong Linux administration skills to manage, patch, and secure operating systems and underlying infrastructure
  • Manage and optimize distributed messaging systems, specifically Kafka, ensuring high throughput and data integrity
What we offer
What we offer
  • Competitive compensation, variable bonus and performance reward opportunities, and retirement programs
  • Medical Insurance
  • Generous, flexible time off
  • Paid holidays, “wellness” days and company wide end of year break
  • Learning & Development stipend
  • Opportunities to volunteer and give back, including charitable donation match
  • Free resources and support for your mental wellbeing
Read More
Arrow Right

Phlebotomist III Site Lead

Represents the face of our company to patients who come to Quest Diagnostics. Th...
Location
Location
United States , Overland Park
Salary
Salary:
20.81 USD / Hour
questdiagnostics.com Logo
Quest Diagnostics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Five years phlebotomy experience required, inclusive of pediatric, geriatric, and capillary collections
  • Keyboard/data entry experience
  • Flexible and available based on staffing needs, which includes weekends, holidays, on-call and overtime
  • Must have reliable transportation, valid driver's license, and clean driving record, if applicable
  • Travel and flexible hours required to work multiple locations and required to cover at Patient Service Center/Mobile/Long-Term Care/In-Office Phlebotomy locations with minimal notice
  • High School Diploma or Equivalent (Required)
  • The position requires the ability to effectively communicate in English
  • Phlebotomy certification (required in certain states, e.g. California, Nevada, Washington and Louisiana) (Preferred)
Job Responsibility
Job Responsibility
  • Collect specimens according to established procedures
  • Administer oral solutions according to established training
  • Research test/client information and confirm and verify all written and electronic orders
  • Responsible for completing all data entry requirements accurately including data entry of patient registration
  • Enter billing information and collect payments when required
  • Data entry and processing specimens including labeling, centrifuging, splitting, and freezing specimens as required by test order
  • Perform departmental-related clerical duties when assigned such as data entry, inventory, stock supplies, and answer phones when needed
  • Read, understand and comply with departmental policies, protocols and procedures
  • Perform verification of patient demographic info/initials including patient signature post-venipuncture
  • Assist with compilation and submission of monthly statistics and data
What we offer
What we offer
  • Day 1 Medical, supplemental health, dental & vision for FT employees who work 30+ hours
  • Best-in-class well-being programs
  • Annual, no-cost health assessment program Blueprint for Wellness®
  • healthyMINDS mental health program
  • Vacation and Health/Flex Time
  • 6 Holidays plus 1 "MyDay" off
  • FinFit financial coaching and services
  • 401(k) pre-tax and/or Roth IRA with company match up to 5% after 12 months of service
  • Employee stock purchase plan
  • Life and disability insurance, plus buy-up option
  • Fulltime
Read More
Arrow Right

O&M Site Technician III

This position will involve work at solar PV power plants maintaining, testing, a...
Location
Location
United States , Ganado
Salary
Salary:
Not provided
enel.com Logo
Enel
Expiration Date
August 31, 2026
Flip Icon
Requirements
Requirements
  • Ability to use electrical test equipment such as, DMM, meggers and multi process calibration equipment
  • Strong knowledge in electrical theory and troubleshooting techniques
  • Willingness to work overtime when required
  • Associates degree or equivalent desired
  • 0-3 years of related experience desired
  • TX Electricians License
  • Availability for daily regional travel to job sites as needed
  • A valid driver's license operable in all 50 states and Canada and a clean driving record
  • The use of PPE is required and must be consistently used in accordance with EGPNA, NFPA, or OSHA guidelines
Job Responsibility
Job Responsibility
  • Testing, maintaining, troubleshooting, and replacing solar PV power plant related electrical equipment
  • Gaining technical information from electrical schematics, vendor information, and applicable codes
  • Perform current and voltage readings as necessary
  • Have an understanding of reliability based maintenance philosophies and work towards certifications such as CMRT and CMRP
  • Perform plant related preventative, reactive, and predictive maintenance scheduled by the Site Manager and/or Lead Technician
What we offer
What we offer
  • Affordable, quality healthcare for you and your family
  • Life insurance and disability benefits
  • Retirement benefits
  • Flexible spending accounts
  • Tuition reimbursement
  • Professional development allowance
  • 401k with match fully vested as of day one
  • 4 weeks annually of vacation
  • Personal days
  • Volunteer days
  • Fulltime
Read More
Arrow Right

Electric Operations Resource Coordinator III

We are looking for an experienced Electric Operations Resource Coordinator III t...
Location
Location
United States , Providence
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4-6 years of relevant experience in scheduling, operations coordination, project management, or construction planning, ideally within the utility industry
  • Associate degree from a technical school is preferred
  • Working knowledge of electric utility operations, including overhead distribution, underground line work, or substation activities
  • Experience coordinating multiple projects and adjusting schedules in real time based on changing field conditions and resource constraints
  • Ability to work effectively with supervisors, contractors, and cross-functional stakeholders while communicating priorities clearly in writing and verbally
  • Strong computer skills with experience using scheduling, tracking, and project documentation tools
  • Understanding of construction practices, materials coordination, compliance expectations, and workload prioritization
  • Willingness to work on-site in Providence, Rhode Island and support emergency restoration or storm-related assignments when required
Job Responsibility
Job Responsibility
  • Direct daily and forward-looking scheduling for electric distribution construction activities, aligning crew assignments with workload, geography, timing, and operational priorities
  • Coordinate both operating and capital work for internal teams and contracted field crews, ensuring customer commitments and project target dates are achieved
  • Prepare complete work packages for outsourced construction, including required drawings, job documentation, material requests, forms, and closeout records such as as-built information
  • Track active and pending work to reduce backlog and promote timely completion of projects across the assigned service area
  • Balance available labor and contractor capacity to support immediate demands while improving longer-term resource utilization
  • Review field progress and crew activity to confirm effective use of materials, equipment, and staffing in support of established schedules
  • Identify risks related to labor availability, site conditions, materials, or equipment constraints and recommend practical scheduling adjustments or escalation steps
  • Lead or support coordination meetings with stakeholders to establish priorities, confirm readiness, and maintain compliance with company policies and construction expectations
  • Organize contingency, maintenance, and reliability-related work in a way that improves efficiency and maximizes fleet and crew productivity
  • Participate in storm response and emergency operations as needed, including availability during weather-related events or system emergencies
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • eligibility to enroll in company 401(k) plan
Read More
Arrow Right