Site Reliability Engineer Sr. Staff Job at Hewlett Packard Enterprise (San Juan)

Site Reliability Engineer Sr. Staff

Designs, develops, troubleshoots and debugs software programs for software enhan...

Location

Puerto Rico , San Juan

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Minimum of 10 years of hands-on experience in Infra Ops, Dev Ops, or Site Reliability Engineering (SRE)
Proficiency with Linux systems, especially Debian-based distributions
Strong experience with cloud platforms such as AWS and GCP
Expertise in Infrastructure as Code tools like Terraform, Packer, and Ansible
Solid programming skills in Python and/or Golang
Deep understanding of containerization (Docker, Container) and orchestration tools (AWS EKS, GCP GKE)
Experience with GitOps workflows
Proven track record in implementing and maintaining CI/CD pipelines
Strong background in security and familiarity with security programs
Experience with monitoring and logging tools (Prometheus, Grafana, ELK)

Job Responsibility

Enhance Infrastructure as Code (IAC) and enforce best practices
Optimize cloud infrastructure for scalability, security, and cost-effectiveness
Develop internal tools to support and streamline cloud platform operations
Improve CI/CD pipelines and deployment workflows using FluxCD and Jenkins
Address container image vulnerabilities and standardize remediation processes
Build Amazon Machine Images (AMIs) aligned with CIS and STIG benchmarks
Strengthen monitoring, alerting, and observability using Prometheus, Grafana, and logging tools
Troubleshoot complex production issues to ensure system reliability and customer satisfaction
Fine-tune distributed systems such as Apache Kafka and Cassandra
Collaborate with development, security, and operations teams to align infrastructure with application needs.

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

Sr Staff / Principal Site Reliability Engineer- Network & Security Operations

As a Site Reliability Engineer, you will be responsible for Palo Alto Networks’ ...

Location

United States , Santa Clara

Salary:

154000.00 - 249500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

8+ years of experience in IAC and infra automation tools, using Terraform & Ansible, CI/CD tools
Expert knowledge on cloud orchestration via GKE / EKS, etc, preferably on GCP
Experienced in designing and implementing Business Continuity Plans and Disaster Recovery Plans
Expert knowledge of firewall technologies (PANW preferred), including VPNs and routing
Advanced knowledge of shell scripting and programming languages such a PERL, Ruby, PHP, or Python
Advanced knowledge of DNS and DHCP, and Microsoft AD infrastructure
Strong analytical skills for interpreting business requirements and translating them into technical specifications
Strong project management, time management, and organizational skills
Excellent communication skills, including the ability to write network and security documentation, policies, and guidelines
Ability to work nights and weekends and provide 24/7 on-call support

Job Responsibility

Design, implement and provide support for IT infrastructure compute components
Install, support and maintain software infrastructure according to best practices, including routers, Load balancers, switches, wifi controllers, and firewalls via terraform/ansible automation
Perform network security design and integration
Diagnose problems and solve issues, often under time constraints
Implement the necessary controls and procedures to protect information systems assets from intentional or inadvertent modification, disclosure, or destruction
Ensure system uptime and backup for all IT infrastructure
Provide security incident triage and response, including working with firewall and device logs, investigating security events, protecting forensic value of data and establishing monitoring and incident reporting and response procedures
Work closely with engineering to help report issues and manage project deliverables and provide status and progress reports
Provide on-call support for Incident Management

What we offer

restricted stock units
bonus

Fulltime

Unix - Senior Cloud - Digital Engineering Sr. Staff Engineer

Location

India , Noida

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Should have a minimum of 8 to 10 years of experience as a Linux/Unix System Administrator. Should have expertise on at least 2 flavors of Unix. Linux is a must!
Should have a deep level of understanding of Linux OS & should be able to handle day to day admin tasks.
Should be well versed shell scripting.
Expert in Unix-Linux, AWS Cloud Administration, OS/server administration, patching, maintenance, and troubleshooting.
Proficient in operating and troubleshooting AWS services like EC2, networking, RDS, backups, storage (EBS, EFS, S3, Glacier), and security (Well-Architected framework).
Possesses a strong understanding of networking concepts for configuring secure VPCs, subnets, landing zones, ACLs, and security groups.
Experience in end-to-end cloud migrations, including strategy, assessment, design, architecture, and execution on AWS.
Skilled in identifying and migrating suitable applications and workloads, gathering migration requirements, and collaborating with stakeholders.
Good knowledge of various AWS services like Lambda, SNS, SQS, DynamoDB, OpenSearch, Transfer Family, CloudWatch, EC2, EFS, EKS, Step Functions, ELB, ACM, Directory Services, and networking.
Hands-on expertise in designing, architecting, deploying, and supporting hybrid cloud environments.

Job Responsibility

Perform installation, customization and maintenance of the UNIX-LINUX Server operating system and system software products in support of business processing requirements for both On-premise and Cloud environment
Evaluate and integrate new operating system versions, drivers and hardware.
Provides in-depth diagnosis for operating systems software/hardware failures and develops solutions.
Monitors and tunes the system to achieve optimum performance levels in standalone and multi-tiered environments.
Conducts system analysis, configuration management and develops improvements for system software performance, availability and reliability.
Implements appropriate levels of system security. Maintain security patching and remediating vulnerabilities, propose solutions for the same.
Perform incident resolution, problem determination and root cause analysis in accordance with service level. Knowledge of ITIL.
Recommend and implement modifications to the server environment, Innovation, Ideas to improve
Preparation of Standard Documents and periodically review them for modifications
Identifies opportunities for process and procedure enhancements to drive efficiency and customer service levels.

Fulltime

Sr Director, Maintenance & Reliability

Provides leadership, direction and strategies for maintenance function consistin...

Location

United States , Big Spring

Salary:

Not provided

Delek US

Expiration Date

Until further notice

Requirements

4 year / Bachelor's Degree (Required)
Ten (10) or more years Management experience (Required)
Fifteen (15) or more years experience in maintenance for large production operations (Required)
General Equipment Maintenance & Repair
Preventative Maintenance
Inspection & Maintenance Procedures
Inspections & Audits
Materials Engineering
Materials Selection
Mechanical Properties

Job Responsibility

Actively participates, as member of refinery leadership team, in development of refinery's strategic and operational plans
Establishes Maintenance-specific objectives aligning with refinery's targets for safety, regulatory compliance, reliability, and efficiency
Ensures risks associated with Maintenance activities are appropriately managed
Directs efforts to improve effectiveness and efficiency while ensuring departmental activities are conducted in safe, environmentally sound and regulatory compliant manner
Manages development and execution of department's policies, programs and procedures to maximize operating efficiency
Ensures adoption of and adherence to engineering guidelines, industry standards and best practices
Manages budget and exercises financial stewardship to control expenditures
Promotes culture of continuous improvement
Accountable for the fiscal responsibility of the Maintenance department
Participates with Corporate on initiatives to improve reliability of the facility

What we offer

Up to a 10% match on 401K on your hire start, with a vesting timeline of only one year
Medical benefits that start on day one with a 30% premium rebate annually
Access to the Calm app for FREE
Additional annual incentives through performance management program

Fulltime

Sr Platformization/Cloud Automation Engineer

Palo Alto Networks CDSS group is looking for a seasoned platformization and clou...

Location

United States , Santa Clara

Salary:

104600.00 - 169225.00 USD / Year

Palo Alto Networks Italia

Expiration Date

Until further notice

Requirements

Bachelors/Masters degree in Computer Science or a related field
5+ years of industry experience in engineering
Fluent scripting skills (preferably Python or Bash) with deep experience in Unix/Linux systems from kernel to shell and beyond
4+ years of working with Microservices architectures on Kubernetes
HandsOn experience with container native tools like Docker, Helm for managing workloads running in Kubernetes
Experience managing AWS and GCP at scale, with knowledge of cloud-neutral connectivity between platforms
Experience designing and maintaining API specifications using Swagger/OpenAPI, and working with API frameworks such as Apigee to enable secure, scalable integrations
HandsOn experience with infrastructure-as-code and automation tools such as Terraform, Ansible, etc.
Proficient in CI/CD platforms like GitlabCI, Jenkins, ArgoCD, CircleCI etc.
In-depth knowledge of operating systems (processes, threads, concurrency, etc)

Job Responsibility

Work with development teams to ensure that applications have scalability and reliability built-in from day one
Design, review and enhance software architecture to improve scalability, service reliability, cost, and performance
Drive platformization by building standardized, self-service infrastructure platforms that improve developer productivity, scalability, and operational efficiency
Deploy automation for provisioning and operating infrastructure at large scale
Partner with teams to improve CI/CD processes and technology
Mentor members of the staff on large scale cloud deployments
Drive the adoption of observability practices and a data-driven mindset
Setup processes like on-call rotations, Postmortems, Run books to continue supporting the infrastructure owned by the SRE team while finding ways to reduce the time to resolution and improve the reliability of services
Support, optimize and deploy mission critical, front-end and back-end production
Improving site performance, monitoring, and overall stability of our infrastructure

Fulltime

Sr. Manager, Engineering - Process & Reliability

Complete oversight of BME operations for large, complex sites and/or multiple si...

Location

United States , Clifton

Salary:

143000.00 - 163000.00 USD / Year

Quest Diagnostics

Expiration Date

Until further notice

Requirements

Minimum of three (3) years experience in a managerial role overseeing a service program (or similar)
Demonstrated understanding, experience, and leadership in Maintenance & Reliability, CCMS Computer Maintenance Management Systems and TPM Total Productive Maintenance (6+ years)
Demonstrated understanding, experience, and leadership in continuous improvement, process management, project management and change management, including leading large or complex projects with multiple workstreams (6+ years)
Ability to navigate the facility and individual labs/sites
Ability to travel
Ability to sit or stand for extended periods of time
Ability to lift light to moderately heavy objects. (1-10 lbs frequently, 11-25 lbs occasionally, 26-50 lbs seldomly)
Must be able to work in a biohazard environment and comply with safety policies and procedures outlined in the Environmental Health & Safety Manual
Daily automation & high complexity operations in a regulated industry
BME technical expertise

Job Responsibility

Lead and optimize the regional implementation of the CMMS / EAM System across Instrument Platforms to track and trend equipment up & downtime and automate KPI Measurement. Metrics and provide end user training
Strategic guidance and collaboration with enterprise operations matrix leadership teams for implementation of Automation platforms, Operations excellence, Reliability, Vendor management, and key projects
Lead, develop, and manage overall operations and distribution of resources (staffing, budgets, and outside vendor services) of the BME program in collaboration & consultation with cross-functional stakeholders and business partners
Review, audit, and participate in decision support activities related to problem diagnosis, repair, preventive maintenance, and quality assurance of equipment
Participate in the development of annual goals and objectives related to supporting the growth and development of equipment support services program. (both locally and enterprise wide)
Implement and manage large/complex projects (enterprise wide) utilizing operational excellence and project/program management skills
Develop and implement technical training for staff (i.e., onboarding materials, maintenance procedures)
Serve as a technical resource for the BME team and lab. Provides “best practices” to other enterprise-wide Quest sites and aids in their development
Oversees evaluation of equipment service needs and communicates with clinical equipment users on proper device use and safety
Evaluates maintenance and cost data related to laboratory equipment, to deliver expected service productivity and quality

What we offer

Day 1 Medical, supplemental health, dental & vision for FT employees who work 30+ hours
Best-in-class well-being programs
Annual, no-cost health assessment program Blueprint for Wellness
healthyMINDS mental health program
Vacation and Health/Flex Time
6 Holidays plus 1 "MyDay" off
FinFit financial coaching and services
401(k) pre-tax and/or Roth IRA with company match up to 5% after 12 months of service
Employee stock purchase plan
Life and disability insurance, plus buy-up option

Fulltime

Sr. Network Manager

We are looking for an experienced Sr. Network Manager to guide the performance, ...

Location

United States , Manchester

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

5+ years of experience in network engineering, network operations, or infrastructure leadership roles
Strong background with Cisco networking technologies, including routers, switching platforms, firewalls, and related enterprise solutions
Demonstrated experience in network design, architecture, and support within complex multi-site environments
Knowledge of Layer 3 networking concepts and protocols, along with secure connectivity solutions such as VPNs and disaster recovery networking
Proven ability to lead technical teams through coaching, workload management, hiring, and performance development
Experience troubleshooting advanced network issues and communicating technical findings clearly to both technical and business stakeholders
Familiarity with network security, wireless environments, and proactive monitoring tools used to maintain stable operations
Working knowledge of tools such as Microsoft Visio and other documentation platforms used for network planning and design

Job Responsibility

Lead the planning, administration, and enhancement of enterprise voice and data networks across corporate, retail, and distribution environments
Direct network engineering activities while supervising internal staff and offshore resources to maintain service quality and execution standards
Resolve complex network incidents and act as an escalation point for critical performance, availability, and security issues
Partner with business and technical teams to deliver infrastructure projects that improve resilience, scalability, and operational efficiency
Oversee routing, switching, wireless, SD-WAN, firewalls, and secure remote connectivity to support reliable enterprise communications
Manage network monitoring practices and operational processes to identify risks early and drive timely corrective action
Coordinate with vendors and service providers on support, procurement, lifecycle planning, and issue resolution
Support on-call escalation needs and participate in after-hours activities such as deployments, cutovers, and operational support when required

What we offer

medical
vision
dental
life and disability insurance
401(k) plan

Fulltime

Sr. Manager, Engineering & Maintenance

The Associate Director, Engineering & Maintenance is responsible for the overall...

Location

United States , Lakeland

Salary:

141900.00 - 195100.00 USD / Year

THE VAIL CORPORATION

Expiration Date

Until further notice

Requirements

Bachelor’s Degree Engineering, Mechanical Engineering, Industrial Engineering, Electrical Engineering, or a related field
5+ years’ experience leading an Engineering or Maintenance team
5+ years managing engineering capital projects
Manufacturing experience

Job Responsibility

Progressive Maintenance: Responsible for the execution of the Progressive Maintenance / Equipment Reliability strategic plan that maintains the plant's electrical, mechanical, and control systems
Direct all reliability and maintenance activities and programs (preventative, predictive, and corrective) to ensure maximum operational potential
Engineering & Capital Management: Responsible for the development and execution of the sites strategic capital plan, along with proper resourcing for on time/budget project execution
People Development: Develops a competent and efficient Engineering & Maintenance Department workforce. Assures employees have the experience, tools, supplies, and materials required for performing maintenance services. Creates a department staffing model to achieve required business outcomes
Department Budgeting: Monitors capital & expense budgets for the Engineering and Maintenance Departments. Manages use of labor and materials to optimize department effectiveness
Project Coordination: Oversees and manages major maintenance and project requirements with manufacturing production and engineering, including significant downtime activities
Health, Safety, and Employee Relations: Enforces company procedures and policies regarding safety and employee conduct
maintaining a work environment that promotes teamwork
Leadership and Team Management: Balancing the needs and development of a diverse team of engineers, technicians, and maintenance staff while ensuring high performance and morale
Project Management: Overseeing multiple projects simultaneously, ensuring they are completed on time, within budget, and to the required standards. This involves managing risks and unforeseen issues

What we offer

Medical
Dental
Short and long-term disability
AD&D
Life insurance
Matching 401(k) plan with immediate vesting
Unlimited sick time
Paid time off
Holiday pay
Free access to fitness center (if in WHQ)

Fulltime

Select Country

Site Reliability Engineer Sr. Staff

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Site Reliability Engineer Sr. Staff

Site Reliability Engineer Sr. Staff

Sr Staff / Principal Site Reliability Engineer- Network & Security Operations

Unix - Senior Cloud - Digital Engineering Sr. Staff Engineer

Sr Director, Maintenance & Reliability

Sr Platformization/Cloud Automation Engineer

Sr. Manager, Engineering - Process & Reliability

Sr. Network Manager

Sr. Manager, Engineering & Maintenance

Our AI answers in your language