CrawlJobs Logo

Site Reliability Engineer II

axon.com Logo

Axon

Location Icon

Location:
United States , Boston

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

115500.00 - 184800.00 USD / Year

Job Description:

As a Site Reliability Engineer II within the APX SRE organization, you’ll focus on delivering practical, scalable solutions to support the reliability and performance of our mission-critical, cloud-native global Kubernetes platform and the services that run on it. You care deeply about system stability, clear documentation, and creating tools that improve the developer experience.

Job Responsibility:

  • Build robust, easy-to-use kubernetes platforms and tools that enable engineering teams to provision and operate services rapidly, consistently, and securely
  • Exemplify cloud-native site reliability best practices
  • Write code that is performant, maintainable, clear, and concise
  • Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
  • Influence and educate the engineering organization to adopt new and improved architectural patterns
  • Provide robust documentation for use by engineers to promote self-service
  • Continually seek improvement within our kubernetes platform for improved reliability, operability, and cost efficiency
  • Take calculated risks, champion new ideas, and cultivate your craft

Requirements:

  • U.S. citizenship (due to handling of classified federal data)
  • 3+ years of applicable experience in Platform engineering and container orchestration
  • Experience building platforms on clouds such as Azure and AWS
  • Building, operating, and innovating clustering solutions for Kubernetes platforms like AKS, EKS, or similar in production at scale
  • Experience with programming languages such as Python, Go, C#, Java, or similar
  • Experience of code collaboration such as GitHub, ArgoCD, or similar
  • Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
  • Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
  • Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Pulumi, or similar
  • Experience designing tooling to simplify the operational management of SaaS/PaaS systems
  • Familiarity with building flexible and testable Infrastructure as Code modules
  • Empathy to support the needs of software engineers
What we offer:
  • Competitive salary and 401k with employer match
  • Discretionary time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Development Programs
  • Snacks in our offices

Additional Information:

Job Posted:
February 17, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer II

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...
Location
Location
Canada , Toronto
Salary
Salary:
115000.00 - 165000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Hands-on experience operating Linux-based systems in production environments
  • Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
  • Experience with container orchestration (e.g., EKS, Kubernetes)
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)
Job Responsibility
Job Responsibility
  • Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
  • Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
  • Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

At Boeing, we innovate and collaborate to make the world a better place. We’re c...
Location
Location
Canada , Richmond
Salary
Salary:
103000.00 - 184000.00 CAD / Year
boeing.com Logo
Boeing
Expiration Date
February 24, 2026
Flip Icon
Requirements
Requirements
  • 7+ years in software development or advanced technical support role
  • 5+ years of experience in site reliability engineering, DevOps, or a related role
  • Proven experience in site reliability engineering, DevOps, or a related role, with a track record of successfully implementing and managing infrastructure and deployment pipelines
  • Candidate must be eligible for authorization under the Canadian Government Controlled Goods Program (CGP) assessment
  • Must be able to obtain Canadian Secret Level II Security Clearance
  • Must be legally able to work in Canada
  • Individuals must not pose a risk for safeguarding of controlled goods
  • Must be eligible to handle US export-controlled data
  • Fluency in English language
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable and highly available infrastructure and processes using modern DevOps practices
  • Deploy and support customer installations, ensuring a smooth setup and integration of our hybrid multi-tenant SaaS solutions into their environments
  • Provide both reactive and proactive support to customers, addressing issues as they arise and implementing strategies to prevent future incidents
  • Lead incident response efforts, perform root cause analysis, and implement preventive measures to minimize downtime and service disruptions
  • Develop and enhance automation tools and scripts to streamline operations, reduce manual intervention, and improve efficiency
  • Set up and manage monitoring and alerting systems to proactively identify and resolve performance issues
  • Analyze system capacity and performance metrics to forecast future needs and ensure scalability of services
  • Collaborate with cross-functional teams to identify and implement new tools, technologies, and processes to enhance DevOps practices
  • Implement and advocate for “security best practices” to protect our applications and customer data
  • Pioneer and support special projects
What we offer
What we offer
  • Competitive base pay and incentive programs
  • Industry-leading tuition assistance program pays your institution directly
  • Resources and opportunities to grow your career
  • Up to $10,000 match when you support your favorite nonprofit organizations
  • Fulltime
!
Read More
Arrow Right

Site Reliability Engineer II

Site Reliability Engineer II - (Microsoft 365 Enterprise + Cloud). We are lookin...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Mid-level years of software development: automation-related experience is most valued
  • Scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# are most relevant, but others are acceptable
  • Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, microservices, and so on
  • Associated troubleshooting skills, including the ability to follow RPC (Remote Procedure Call) call-chains across arbitrary network steps
  • Consequent understanding of monitoring in distributed systems
  • Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack
  • understanding of how applications are affected by the above, and ability to debug same
  • Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems
  • Practical experience running large scale online systems is always an advantage
Job Responsibility
Job Responsibility
  • Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies
  • identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance
  • Drives the adoption of innovative solutions across engineering teams working with related products within an organization
  • Apply advanced statistical and machine learning techniques to analyze large datasets and extract meaningful insights
  • Experience working with all service aspects of high throughput and multi-tenant services, ability to understand and design workflows carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability
  • Engages with product engineering teams by partaking in code/design reviews, participating in on-call rotations and incident responses throughout product development and operations cycles
  • leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention
  • Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale
  • reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization
  • Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

We are the Data Center Network Services team within Cisco IT, supporting network...
Location
Location
United States of America , Research Triangle Park, North Carolina
Salary
Salary:
109900.00 - 200100.00 USD / Year
duo.com Logo
Duo Security
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Engineering or Technology, with 0- 3 years of experience in building, testing, or deploying scalable network applications
  • Strong programming skills, with expertise in Python and Ansible scripting
  • Hands-on experience with tools such as JIRA, Git, and Jenkins
  • Proficiency with Continuous Integration/Continuous Deployment (CI/CD) and pipeline setup
  • Solid understanding of software engineering concepts: data structures, algorithms, object-oriented programming, distributed systems, and cloud computing
Job Responsibility
Job Responsibility
  • Design, develop, test, and deploy new software capabilities for Data Center Networks
  • Collaborate with engineers across multiple disciplines and engage with internal clients
  • Deliver innovative, high-quality solutions that enhance the client experience
What we offer
What we offer
  • Medical, dental and vision insurance
  • 401(k) plan with a Cisco matching contribution
  • Paid parental leave
  • Short and long-term disability coverage
  • Basic life insurance
  • 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
  • 1 paid day off for employee’s birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness
  • Non-exempt employees receive 16 days of paid vacation time per full calendar year
  • Exempt employees participate in Cisco’s flexible vacation time off program
  • 80 hours of sick time off provided on hire date and each January 1st thereafter
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer II

As a Site Reliability Engineer II within the APX SRE organization, you’ll focus ...
Location
Location
United States , Seattle
Salary
Salary:
115500.00 - 184800.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • U.S. citizenship (due to handling of classified federal data)
  • 3+ years of applicable experience in Platform engineering and container orchestration
  • Experience building platforms on clouds such as Azure and AWS
  • Experience building, operating, and innovating clustering solutions for Kubernetes platforms like AKS, EKS, or similar in production at scale
  • Experience with programming languages such as Python, Go, C#, Java, or similar
  • Experience of code collaboration such as GitHub, ArgoCD, or similar
  • Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
  • Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
  • Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Pulumi, or similar
  • Experience designing tooling to simplify the operational management of SaaS/PaaS systems
Job Responsibility
Job Responsibility
  • Build robust, easy-to-use kubernetes platforms and tools that enable engineering teams to provision and operate services rapidly, consistently, and securely
  • Exemplify cloud-native site reliability best practices
  • Write code that is performant, maintainable, clear, and concise
  • Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
  • Influence and educate the engineering organization to adopt new and improved architectural patterns
  • Provide robust documentation for use by engineers to promote self-service
  • Continually seek improvement within our kubernetes platform for improved reliability, operability, and cost efficiency
  • Take calculated risks, champion new ideas, and cultivate your craft
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Development Programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right