Site Reliability Engineer II Job at Axon (Peachtree Corners)

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...

Location

United States , Birmingham

Salary:

Not provided

Genuine Parts Company

Expiration Date

Until further notice

Requirements

Bachelor's degree
Three (3) to five (5) years of related experience or an equivalent combination
Intermediate knowledge of appropriate networks, products, and protocols
Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
Troubleshooting skills
Problem solving skills
Demonstrated knowledge and adherence to Change Management processes
Ability to interface well with customers, end users, partners, and associates

Job Responsibility

Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
Responsible for making sure that the company network works
Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
Helps to negotiate and place orders with common carriers
Performs other duties as assigned

What we offer

Healthcare coverage
401(k)
Tuition reimbursement
Vacation
Sick pay
Holiday pay

Fulltime

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...

Location

United States , Scottsdale

Salary:

Not provided

Axon

Expiration Date

Until further notice

Requirements

7+ years of experience in operations, site reliability, or infrastructure engineering roles
Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
Experience with security monitoring, alerting, SIEM platforms, and observability tools
Solid grasp of CI/CD practices with integrated security testing and compliance checks
Experience managing Kubernetes clusters and running containerized workloads in production
Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
PKI solutions such as EJBCA, Smallstep, Venafi
or vaulting solutions such as Hashicorp Vault

Job Responsibility

Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
Collaborating with developers to develop new features, services, and infrastructure requirements
Enhancing security observability through improved log collection, metrics, and alerting configurations
Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
Troubleshoot and resolve complex operational and system-level issues across environments

What we offer

Competitive salary and 401k with employer match
Discretionary paid time off
Paid parental leave for all
Medical, Dental, Vision plans
Fitness Programs
Emotional & Mental Wellness support
Learning & Development programs
Snacks in our offices

Fulltime

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...

Location

United States , Scottsdale

Salary:

Not provided

Axon

Expiration Date

Until further notice

Requirements

7+ years of experience in operations, site reliability, or infrastructure engineering roles
Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
Experience with security monitoring, alerting, SIEM platforms, and observability tools
Solid grasp of CI/CD practices with integrated security testing and compliance checks
Experience managing Kubernetes clusters and running containerized workloads in production
Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
PKI solutions such as EJBCA, Smallstep, Venafi
or vaulting solutions such as Hashicorp Vault

Job Responsibility

Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
Collaborating with developers to develop new features, services, and infrastructure requirements
Enhancing security observability through improved log collection, metrics, and alerting configurations
Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
Troubleshoot and resolve complex operational and system-level issues across environments

What we offer

Competitive salary and 401k with employer match
Discretionary paid time off
Paid parental leave for all
Medical, Dental, Vision plans
Fitness Programs
Emotional & Mental Wellness support
Learning & Development programs
Snacks in our offices

Fulltime

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...

Location

Canada , Toronto

Salary:

115000.00 - 165000.00 CAD / Year

PagerDuty

Expiration Date

Until further notice

Requirements

3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
Hands-on experience operating Linux-based systems in production environments
Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
Experience with container orchestration (e.g., EKS, Kubernetes)
Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)

Job Responsibility

Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
Stay current on technical trends to suggest innovative tools and approaches to interesting problems
Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents

What we offer

Competitive salary
Comprehensive benefits package
Flexible work arrangements
Company equity
ESPP (Employee Stock Purchase Program)
Retirement or pension plan
Generous paid vacation time
Paid holidays and sick leave
Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent

Fulltime

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...

Location

Canada , Toronto

Salary:

115000.00 - 165000.00 CAD / Year

PagerDuty

Expiration Date

Until further notice

Requirements

3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
Hands-on experience operating Linux-based systems in production environments
Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
Experience with container orchestration (e.g., EKS, Kubernetes)
Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)

Job Responsibility

Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
Stay current on technical trends to suggest innovative tools and approaches to interesting problems
Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents

What we offer

Competitive salary
Comprehensive benefits package
Flexible work arrangements
Company equity
ESPP (Employee Stock Purchase Program)
Retirement or pension plan
Generous paid vacation time
Paid holidays and sick leave
Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent

Fulltime

Site Reliability Engineer II

The IDEAS organization’s mission is to unlock the power of data to deliver actio...

Location

United States , Redmond

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR equivalent experience
Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell)
Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions
Effective written and verbal communication skills, and experience collaborating across teams and disciplines
Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter
The successful candidate must have an active U.S. Government Secret Security Clearance

Job Responsibility

Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning
Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort
Instrument services for observability
collect and analyze telemetry and health signals
and use data to guide reliability and performance improvements
Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions
Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
Support compliance with security, privacy, and accessibility requirements throughout service onboarding and ongoing operations
Continuously learn and adopt industry practices and internal tools to improve reliability, performance, and observability

Fulltime

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...

Location

United States , Multiple Locations

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR equivalent experience
Active U.S. Government Top Secret Security Clearance
Ability to pass Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems

Fulltime

Site Reliability Engineer II

Site Reliability Engineer II - (Microsoft 365 Enterprise + Cloud). We are lookin...

Location

Ireland , Dublin

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Mid-level years of software development: automation-related experience is most valued
Scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# are most relevant, but others are acceptable
Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, microservices, and so on
Associated troubleshooting skills, including the ability to follow RPC (Remote Procedure Call) call-chains across arbitrary network steps
Consequent understanding of monitoring in distributed systems
Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack
understanding of how applications are affected by the above, and ability to debug same
Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems
Practical experience running large scale online systems is always an advantage

Job Responsibility

Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies
identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance
Drives the adoption of innovative solutions across engineering teams working with related products within an organization
Apply advanced statistical and machine learning techniques to analyze large datasets and extract meaningful insights
Experience working with all service aspects of high throughput and multi-tenant services, ability to understand and design workflows carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability
Engages with product engineering teams by partaking in code/design reviews, participating in on-call rotations and incident responses throughout product development and operations cycles
leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention
Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale
reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization
Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale

Fulltime

Select Country

Site Reliability Engineer II

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Site Reliability Engineer II

Site Reliability Engineer II

Senior Security Operations Engineer II

Senior Security Operations Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Our AI answers in your language