CrawlJobs Logo

Site Reliability Engineer II

https://www.roberthalf.com Logo

Robert Half

Location Icon

Location:
United States , Alpharetta

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a skilled Site Reliability Engineer (SRE) to join our team and help build, maintain, and scale cloud‑native infrastructure in Microsoft Azure. This role partners closely with development and operations teams to ensure systems are reliable, scalable, secure, and cost‑efficient. The ideal candidate is passionate about automation, infrastructure‑as‑code, GitOps, and observability, and thrives in a collaborative, fast‑paced environment. You will play a critical role in improving system resilience and establishing strong SRE practices from the ground up.

Job Responsibility:

  • Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt
  • Maintain, operate, and optimize Kubernetes clusters on Azure Kubernetes Service (AKS)
  • Build and manage CI/CD pipelines using GitHub Actions / GitHub Workflows
  • Implement GitOps-based deployments using ArgoCD
  • Enhance system reliability by implementing monitoring, alerting, and observability solutions using Grafana
  • Automate operational tasks to reduce toil and improve team efficiency
  • Participate in on-call rotations, incident response, root cause analysis, and post-mortems
  • Partner with development teams to improve application performance, scalability, and resilience
  • Implement and promote SRE best practices, including: Service Level Indicators (SLIs)
  • Service Level Objectives (SLOs)
  • Error budgets
  • Continuously improve system performance, security posture, and cloud cost efficiency

Requirements:

  • 3+ years of experience in an SRE, DevOps, or Cloud Infrastructure role
  • Strong hands-on experience with Microsoft Azure
  • Infrastructure-as-Code experience using Terraform and Terragrunt
  • Experience designing and managing cloud-native environments
  • Proficiency with Kubernetes (preferably AKS)
  • Experience supporting containerized workloads and orchestration patterns
  • Exposure to Databricks environments is required
  • Experience with GitHub Actions / GitHub Workflows
  • Hands-on experience with ArgoCD and GitOps-based deployment strategies
  • Solid understanding of Grafana
  • Hands-on experience with Java in a production or platform context

Nice to have:

  • Experience with Prometheus is a plus
  • Familiarity with Loki and Tempo is a plus
What we offer:
  • medical, vision, dental, and life and disability insurance
  • eligible to enroll in our company 401(k) plan

Additional Information:

Job Posted:
January 20, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer II

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Expert Site Reliability Engineer

Expert Site Reliability Engineer provides technical expertise and strategic guid...
Location
Location
India
Salary
Salary:
Not provided
uk.alterahealth.com Logo
Altera Digital Health Inc. UK
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree (Preferred)
  • 8+ years relevant work experience
  • 5–7 years Expert level experience providing systems engineering in assigned product
  • 8+ years experience with healthcare products in a support, development or consultancy environment
  • Experience with Windows Server and IIS
  • Experience with SQL
  • Experience in Application support
Job Responsibility
Job Responsibility
  • Provide continual technical guidance and support to the client on an ongoing basis
  • Collaborate with the internal technical teams to ensure successful implementation and integration of the proposed solutions
  • Collaborate with business stakeholders and TAM to understand business requirements and objectives
  • Design solutions that align with Hosting best practices, industry standards, and organizational business priorities
  • Develop and document overall technical architecture for the client
  • Design and document integration of various systems, components, and third-party services
  • Create architectural diagrams and documentation
  • Identify potential technical risks and provide mitigation strategies
  • Proactively address the challenges related to project deliverables and client environments
  • Review Control systems for your assigned client on a weekly basis and take appropriate actions to mitigate issues
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Senior Security Operations Engineer II

As a Senior Security Operations Engineer, you’ll play a key role in ensuring the...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, site reliability, or infrastructure engineering roles
  • Strong experience securing and managing cloud environments (e.g., AWS, Azure) and containerized workloads
  • Deep understanding of Linux systems, networking, distributed systems, and their associated security controls
  • Proficiency in automation, scripting, and security tooling integration to streamline operations and enforcement
  • Experience with security monitoring, alerting, SIEM platforms, and observability tools
  • Solid grasp of CI/CD practices with integrated security testing and compliance checks
  • Experience managing Kubernetes clusters and running containerized workloads in production
  • Experience with deploying and administrating any of the following: scalable cloud native secrets solutions such as AWS KMS, Azure KeyVault
  • PKI solutions such as EJBCA, Smallstep, Venafi
  • or vaulting solutions such as Hashicorp Vault
Job Responsibility
Job Responsibility
  • Implementing and improving automated security checks in CI/CD pipelines to prevent vulnerabilities from reaching production
  • Writing, reviewing, and maintaining security-focused infrastructure-as-code for scalable and compliant deployments
  • Investigating security incidents, performing root cause analysis, and implementing long-term mitigation strategies
  • Collaborating with developers to develop new features, services, and infrastructure requirements
  • Enhancing security observability through improved log collection, metrics, and alerting configurations
  • Maintaining and improving security runbooks, incident response playbooks, and internal security tooling for operational efficiency
  • Resolve security/infrastructure incidents by participating in high impact/high visibility incidents as a participant and ideally as an incident commander
  • Maintain and secure critical infrastructure components such as PKI (Public Key Infrastructure) and IAM ( Identity & Access Management) systems, ensuring reliability, scalability, and compliance with organizational and industry security standards
  • Build and maintain secure, reliable, and scalable infrastructure that protects core services and sensitive data
  • Troubleshoot and resolve complex operational and system-level issues across environments
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Software Engineer II - CoreAI

As an AI Engineer on the CoreAI Platform team, you will apply artificial intelli...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, build, and scale AI models to detect anomalies, identify regressions across large-scale AI systems
  • Analyze patterns in telemetry, logs, and real‑time signals to uncover root causes, predict failures, and drive proactive mitigations
  • Apply AI to identify emerging usage trends, performance hotspots, and workload irregularities that impact system health and user experience
  • Build lightweight automation that leverages anomaly detection signals and pattern analysis to improve live‑site reliability and engineering velocity
  • Contribute to hotfixes, performance tuning, and reliability improvements in production AI engines (e.g., GPU savings, SLA reliability, customer satisfaction)
  • Build intuitive, responsive UI components for AI dashboards and telemetry tools using React and modern web technologies
  • Communicate technical concepts with clarity and initiative, proactively seeking feedback and driving continuous improvement
  • Stay current with industry trends in applied AI, observability, and performance engineering
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer II

Fivetran is looking for a high-performance engineer to be a part of a team of Si...
Location
Location
United States , Oakland
Salary
Salary:
133897.53 - 160683.46 USD / Year
fivetran.com Logo
Fivetran
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge of Cloud Platforms and related tooling: AWS, GCP, Azure, Terraform, configuration management
  • Experience in a scripting language
  • A strong foundation in Linux operating system internals and administration
  • Knowledge of Kubernetes
  • Familiarity with a relational database
Job Responsibility
Job Responsibility
  • Responsible for monitoring the availability, capacity, and throughput of Fivetran's production infrastructure to identify and address potential issues
  • Collaborate with engineering teams to integrate reliability best practices into the product roadmap
  • Support the prioritization and resolution of critical bugs identified by support or sales
  • Contribute to maintaining 100% availability of production infrastructure by collaborating with engineering to implement automation for scalable deployments
  • Proactively monitor infrastructure vulnerabilities and collaborate with the security team to address them in a timely manner
What we offer
What we offer
  • 100% employer-paid medical insurance
  • Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
  • RSU stock grants
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team-building activities
  • Monthly cell phone stipend
  • Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer II

Fivetran is looking for a high-performance engineer to be a part of a team of Si...
Location
Location
United States , Denver
Salary
Salary:
120507.78 - 144615.12 USD / Year
fivetran.com Logo
Fivetran
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge of Cloud Platforms and related tooling: AWS, GCP, Azure, Terraform, configuration management
  • Experience in a scripting language
  • A strong foundation in Linux operating system internals and administration
  • Knowledge of Kubernetes
  • Familiarity with a relational database
Job Responsibility
Job Responsibility
  • Responsible for monitoring the availability, capacity, and throughput of Fivetran's production infrastructure to identify and address potential issues
  • Collaborate with engineering teams to integrate reliability best practices into the product roadmap
  • Support the prioritization and resolution of critical bugs identified by support or sales
  • Contribute to maintaining 100% availability of production infrastructure by collaborating with engineering to implement automation for scalable deployments
  • Proactively monitor infrastructure vulnerabilities and collaborate with the security team to address them in a timely manner
What we offer
What we offer
  • 100% employer-paid medical insurance
  • Generous paid time-off policy (PTO), plus paid sick time, inclusive parental leave policy, holidays, and volunteer days off
  • RSU stock grants
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team-building activities
  • Monthly cell phone stipend
  • Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching, and self-guided mindfulness exercises for all covered employees and their covered dependents
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Doctolib’s Engineering environment is rich and we are building innovative produc...
Location
Location
France , Nantes; Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid hands-on experience (3y+) on a large-scale production platform
  • Proven experience with cloud platforms such as AWS, Azure or Google Cloud
  • Solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
  • Strong understanding of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows
  • Proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and a deep understanding of infrastructure as code principles
  • Experience with monitoring and observability tools
  • Like troubleshooting performance issues in complex environments
  • Speak English
Job Responsibility
Job Responsibility
  • Platform Reliability: Design, build, and maintain the core platform infrastructure to enable scalability and resilience
  • Automation and Efficiency: Develop tools and processes to automate the deployment, scaling, and lifecycle management of services
  • Monitoring and Incident Management: Implement robust monitoring, alerting, and incident response mechanisms
  • Disaster Recovery: Design and execute disaster recovery strategies
  • Collaborate with Feature Teams: Partner with product and engineering teams to embed reliability best practices
  • Continuous Improvement: Research and evaluate emerging technologies and tools
  • On-Call Ownership: Participate in an on-call rotation
What we offer
What we offer
  • Free Health Insurance for you & your family
  • Up to 14 days of RTT
  • Parental care program (1 month off in addition to the legal parental leave and 0,5 days off per child when the school starts)
  • Wellbeing program (free mental health and coaching offer with our partner moka.care)
  • A flexible workplace policy offering both hybrid and office-based mode
  • Flexibility days allowing to work in EU countries and the UK 10 days per year
  • Lunch voucher with Swile card
  • Work Council subsidy to refund part of sport club membership or creative class
  • Bicycle subsidy
  • Fulltime
Read More
Arrow Right