Site Reliability Engineer III Job at Genuine Parts Company (Birmingham, Alabama)

Site Reliability Engineer III

We're looking for a senior Site Reliability Engineer to join our small, high-own...

Location

United States

Salary:

148320.00 - 185400.00 USD / Year

AbsenceSoft

Expiration Date

Until further notice

Requirements

5+ years of experience in SRE, DevOps, or a related engineering role
Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
Experience building and operating CI/CD pipelines using Jenkins and GitHub
Proficiency in Python, Go, or Bash for automation
Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
Demonstrated experience leading incident response in complex, distributed systems
Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
A collaborative, ownership-driven mindset with strong communication skills

Job Responsibility

Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
Define and maintain SLOs, SLIs, and error budgets
Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
Lead blameless postmortems
Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
Mentor junior SREs through code reviews, incident pairing, and documentation

What we offer

Impact that matters
Flexibility and trust
Remote-first and results driven
Growth and development
Access to learning resources, leadership programs, and real opportunities to take on new challenges
Competitive rewards
Comprehensive benefits
Performance-based bonus program
Equity opportunities
Time for life

Fulltime

Site Reliability Engineer III

The Site Reliability Engineer is responsible for designing, developing, and main...

Location

India , Hyderabad

Salary:

Not provided

Amgen

Expiration Date

Until further notice

Requirements

Doctorate degree OR 6 to 10 years of Computer Science, IT or related field experience OR
Master’s degree and 7 to 10 years of Computer Science, IT or related field experience OR
Bachelor’s degree and 8 to 12 years of Computer Science, IT or related field experience
Working experience with various cloud services on AWS (Azure, GCP) and containerization technologies (Docker, Kubernetes)
Strong programing skills in languages such as Python
Working experience of infrastructure as code (IaC) tools (Terraform, CloudFormation)
Working experience with monitoring and alerting tools (Prometheus, Grafana, etc.)
Working experience with DevOps/MLOps practice and CI/CD pipelines
Proficiency in automated testing tools and frameworks (e.g., Selenium, JUnit, pytest), Incident Management, Production Issue Root Cause Analysis and Improve System Quality

Job Responsibility

Design and implement systems and processes to improve the reliability, scalability, and performance of applications
Automate routine operational tasks, such as deployments, monitoring, and incident response, to improve efficiency and reduce human error
Develop and maintain monitoring tools and dashboards to track system health, performance, and availability
Respond to and resolve incidents promptly, conducting root cause analysis and implementing preventive measures
Provide ongoing maintenance and support for existing systems, ensuring that they are secure, efficient, and reliable
Work on integrating various software applications and platforms to ensure seamless operation across the organization
Implement and maintain security measures to protect systems from unauthorized access and other threats

What we offer

Competitive and comprehensive Total Rewards Plans that are aligned with local industry standards

Site Reliability Engineer III

Under limited supervision, the Site Reliability Engineer III is responsible for ...

Location

United States , Birmingham

Salary:

Not provided

Alliance Automotive UK LV Ltd

Expiration Date

Until further notice

Requirements

Typically requires a bachelor's degree and five (5) or more years of related experience or an equivalent combination
Understanding of Kubernetes, containers, clusters, and elastic scalability
Expertise in SRE principles
Mindset of continually finding ways to drive scalability, stability, and performance
Cloud Services experience with Google Cloud Platform (GCP)
Experience with API, service-based or microservice-based architecture
Proficiency in infrastructure, network, database, operating systems, or security troubleshooting and remediation
Architecture-level knowledge of Windows and Linux and Infrastructure systems
Experience with production deployment, monitoring, and operational support for enterprise-class applications (Dynatrace a plus)
Experience working with Continuous Integration/ Continuous Deployment tools

Job Responsibility

Gathers and analyzes metrics from monitoring platforms to assist in performance tuning and fault tolerance
Partners with development teams to improve services through testing and release procedures
Participates in system design, platform management and capacity planning
Balances feature development speed and reliability with service-level objectives
Works closely with the incident response team and restoring service to normal operation
Understands debugging and applying troubleshooting skills
Investigates, blocks and rate-limits unwanted traffic
Utilizes monitoring systems and dashboards for proactive changes and alerting
Establishes continuous process improvement cycles where the process, performance, and supporting technologies are reviewed and enhanced where applicable
Performs other duties as assigned

What we offer

options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay

Fulltime

Site Reliability Engineer III

Zuora’s Cloud Engineering teams are responsible for Cloud infrastructures, monit...

Location

India , Chennai

Salary:

Not provided

Zuora

Expiration Date

Until further notice

Requirements

6-8 years of relevant experience on SRE/DevOps
Proven hands-on working experience with core AWS services (e.g., EC2, VPC, S3, RDS, IAM, CloudWatch, EKS/ECS)
Deep expertise in infrastructure-as-code principles using Terraform for provisioning and state management
Expert-level knowledge and practical experience with configuration management tools such as Puppet and/or Ansible
Strong experience setting up, maintaining, and enhancing Continuous Integration/Continuous Deployment pipelines using Jenkins
Proficiency in scripting languages, particularly Python and/or Shell scripting, for developing automation tools and performing system administration tasks
Advanced knowledge of Linux operating systems, including performance tuning, troubleshooting, security, and networking fundamentals
Working knowledge and operational experience with distributed messaging queues, specifically Kafka

Job Responsibility

Maintain and improve the reliability, scalability, and performance of our production systems, targeting a high-availability environment
Design, implement, and maintain automation solutions for infrastructure provisioning, deployment, configuration management, and monitoring using Terraform and Jenkins
Administer, manage, and optimize our cloud infrastructure primarily hosted on AWS, focusing on cost efficiency and secure operations
Develop and maintain infrastructure-as-code using Puppet and/or Ansible to ensure consistent and reproducible environments
Participate in on-call rotation, troubleshoot and resolve critical production incidents, and conduct comprehensive post-mortems to prevent recurrence
Apply strong Linux administration skills to manage, patch, and secure operating systems and underlying infrastructure
Manage and optimize distributed messaging systems, specifically Kafka, ensuring high throughput and data integrity

What we offer

Competitive compensation, variable bonus and performance reward opportunities, and retirement programs
Medical Insurance
Generous, flexible time off
Paid holidays, “wellness” days and company wide end of year break
Learning & Development stipend
Opportunities to volunteer and give back, including charitable donation match
Free resources and support for your mental wellbeing

Site Reliability Operations III

The Command & Control Center is the nerve center for Walmart Global Technology. ...

Location

United States of America , Bentonville

Salary:

80000.00 - 155000.00 USD / Year

Walmart

Expiration Date

Until further notice

Requirements

Strong communication and interpersonal skills
Experience with Jira, Looper, and Kubernetes
Familiarity with Grafana and ability to write queries (PromQL)
GitHub experience
Database knowledge is preferable but not required
Ability to work independently and make decisions with guidance
Comprehension of changes to methodologies and resources, and ability to articulate the same
Experience with cloud applications and ability to pull logs
Strong analytical and problem-solving skills
Ability to work collaboratively with cross-functional teams

Job Responsibility

Monitor and alert on software or system performance, determining thresholds for monitoring metrics and triggers alerts based on thresholds
Supervise specific procedures to proactively check the health of applications and infrastructure, including a variety of operating systems, hardware, and software
Investigate and diagnose incidents to restore a failed IT service as quickly as possible and within specified SLAs
Document troubleshooting steps and service restoration details for knowledge management
Liaison between Tech and external support to resolve escalated incidents and ensure timely closure
Record and classify received incidents and undertake immediate corrective action for moderate complexity queries under moderate supervision
Research and recommend alternative actions for incident resolution
Contribute to command-and-control related activities focused on restoration of complex outages
Conduct complex maintenance procedures for applications independently
Monitor and evaluate the performance of the application by tracking and analyzing appropriate metrics

What we offer

Multiple health plan options, including vision & dental plans for you & dependents
Financial benefits including 401(k), stock purchase plans, life insurance and more
Associate discounts in-store and online
Education assistance for Associate and dependents
Parental Leave
Pay during military service
Paid Time off - to include vacation, sick, parental
Short-term and long-term disability for when you can't work because of injury, illness, or childbirth
incentive awards for your performance
maternity and parental leave, PTO, health benefits

Fulltime

Controls Engineer III – Robotic Arm

As a Controls Engineer – Robotics, you will own the full lifecycle engineering o...

Location

United States , Mendon

Salary:

110000.00 - 130000.00 USD / Year

Autonomous Solutions

Expiration Date

Until further notice

Requirements

Bachelor's degree in Robotics, Mechatronics, Mechanical Engineering, Electrical Engineering, or related field
5+ years of hands-on experience in robotic systems integration, with demonstrated end-to-end ownership - from concept and design through commissioning and field support
Strong experience with FANUC programming (TP/KAREL), system configuration, and integration
Meaningful, hands-on experience with vision systems (2D and/or 3D) - not just academic exposure. Must be able to configure, calibrate, troubleshoot, and adapt vision systems in real-world deployments
Proven ability to design or collaborate on custom EOAT and mechanical interface solutions
Demonstrated experience working across hardware/software boundaries, including sensors, PLCs, and system interfaces
Strong troubleshooting skills across mechanical, electrical, and software domains
Ability to operate independently in fast-paced, ambiguous, field-deployed environments
Strong communication and collaboration skills across cross-functional teams

Job Responsibility

Own end-to-end integration of FANUC robotic arm systems - from system architecture and motion planning through field deployment and ongoing optimization
Lead vision system integration and problem-solving, with particular focus on adapting 2D/3D vision to outdoor, variable-lighting environments including direct sun, glare, shadow, and low-light conditions
Partner with mechanical and electrical teams to design, iterate, and validate end-of-arm tooling (EOAT) and interface mechanisms - including custom tooling solutions for novel connection challenges
Develop robust motion strategies, error handling, and recovery routines capable of performing reliably in real-world, uncontrolled conditions
Troubleshoot complex field issues across mechanical, electrical, and software domains using structured root cause analysis
Collaborate with third-party vision system vendors and integrate their solutions into ASI's robotics platform
Drive testing, validation, and performance benchmarking against reliability, safety, and uptime requirements
Support on-site deployments and work directly with operations teams to ensure sustained system performance
Mentor junior engineers and provide technical leadership within the robotics team
Contribute to engineering standards, documentation, and continuous improvement processes

Fulltime

Network Engineer III

We are seeking talented, experienced Network Engineering professionals to join t...

Location

United States , Huntsville

Salary:

79119.18 - 190122.20 USD / Year

Arcfield

Expiration Date

Until further notice

Requirements

Bachelor's degree in Network Engineering, Computer Engineering, Computer Science, or a related technical field with 5-7 years of experience, MS 3-5 years of experience, PhD 0-2 years of experience
Valid Security+ or DoD Directive 8570.01 IAT Level II certification
Valid Cisco Certified Network Associate (CCNA) certification or higher
3+ years of general work experience designing, developing, and implementing network architecture
3+ years of experience configuring and troubleshooting routers, switches, and associated network protocols such as OSPF, EIGRP, and Rapid PVST
3+ years of experience supporting network security hardware and solutions like Cisco ASA firewalls, IPSEC, Access Control Lists (ACLs), and Network Address Translation (NAT)
Experience with network analysis tools such as SolarWinds and Wireshark
Experience with Visio
Possess and maintain a Secret clearance

Job Responsibility

Develop and deploy communications architectures to meet dynamic mission requirements across multiple ranges
Design, install, maintain, and repair deployable communications sites, ensuring optimal performance and reliability
Route data with type 1 encryption between various ranges and sites, ensuring secure and reliable communication
Configure and secure enterprise services, including routers, switches, firewalls, and access control solutions
Troubleshoot and optimize routing protocols such as OSPF and EIGRP, along with QoS, VPNs, and Spanning Tree Protocol
Interface with analog voice communications systems across multiple locations to ensure seamless integration
Maintain and update systems with patches to comply with Risk Management Framework (RMF) Information Assurance Vulnerability Management (IAVM) requirements
Collaborate with systems administrators on other standalone systems to improve their network architecture
Develop comprehensive system documentation, including standard operating procedures and network drawings
Regularly report project status to the Lead Network Engineer and Senior Management

What we offer

Health Insurance
Life Insurance
Paid Time Off
Holiday Pay
Short Term and Long-Term Disability
Retirement and Savings
Learning and Development opportunities
wellness programs

Fulltime

Network Engineer III

We are seeking talented, experienced Network Engineering professionals to join t...

Location

United States , Huntsville

Salary:

Not provided

Arcfield

Expiration Date

Until further notice

Requirements

Bachelor's degree in Network Engineering, Computer Engineering, Computer Science, or a related technical field with 5-7 years of experience, MS 3-5 years of experience, PhD 0-2 years of experience
Valid Security+ or DoD Directive 8570.01 IAT Level II certification
Valid Cisco Certified Network Associate (CCNA) certification or higher
3+ years of general work experience designing, developing, and implementing network architecture
3+ years of experience configuring and troubleshooting routers, switches, and associated network protocols such as OSPF, EIGRP, and Rapid PVST
3+ years of experience supporting network security hardware and solutions like Cisco ASA firewalls, IPSEC, Access Control Lists (ACLs), and Network Address Translation (NAT)
Experience with network analysis tools such as SolarWinds and Wireshark
Experience with Visio
Possess and maintain a Secret clearance

Job Responsibility

Develop and deploy communications architectures to meet dynamic mission requirements across multiple ranges
Design, install, maintain, and repair deployable communications sites, ensuring optimal performance and reliability
Route data with type 1 encryption between various ranges and sites, ensuring secure and reliable communication
Configure and secure enterprise services, including routers, switches, firewalls, and access control solutions
Troubleshoot and optimize routing protocols such as OSPF and EIGRP, along with QoS, VPNs, and Spanning Tree Protocol
Interface with analog voice communications systems across multiple locations to ensure seamless integration
Maintain and update systems with patches to comply with Risk Management Framework (RMF) Information Assurance Vulnerability Management (IAVM) requirements
Collaborate with systems administrators on other standalone systems to improve their network architecture
Develop comprehensive system documentation, including standard operating procedures and network drawings
Regularly report project status to the Lead Network Engineer and Senior Management

Fulltime

Select Country

Site Reliability Engineer III

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Site Reliability Engineer III

Site Reliability Engineer III

Site Reliability Engineer III

Site Reliability Engineer III

Site Reliability Engineer III

Site Reliability Operations III

Controls Engineer III – Robotic Arm

Network Engineer III

Network Engineer III

Our AI answers in your language