CrawlJobs Logo

Site Reliability Engineer 3

United States, Chicago 99684.63 - 149526.95 USD / Year · Job Posted March 19, 2026
Apply Position
Job Link Share

Job Description

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Chicago, IL. As a member of the Global Operation team, you will be responsible for ensuring the reliability, scalability, and performance of Freewheel systems. Working closely with engineers and other operation sub-teams, you will manage infrastructure, optimize system reliability, automate daily operations, and resolve technical issues that impact upstream/downstream platform.

Job Responsibility

  • System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
  • Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
  • Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
  • Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
  • Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
  • Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
  • Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
  • Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues

Requirements

  • 3+ years of experience as an SRE, DevOps or Operations Engineer
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
  • Hands-on experience with Terraform and infrastructure as code principle is a huge plus
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
  • Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
  • System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
  • Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
  • Proactive learner eager to grow in operations and governance
  • Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field

What we offer

  • Medical, prescription, vision, and dental insurance for eligible employees
  • 401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
  • Paid time off including eight observed company holidays and flex time
  • Exclusive perks + discounts, including tuition assistance, commuter benefits and more

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer 3

8 matching positions

Site Reliability Engineer 3

We are seeking a skilled and motivated Site Reliability Engineer to join our tea...
Location
Location
India , Chennai
Salary
Salary:
Not provided
trimble.com Logo
Trimble Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in a relevant field of study (e.g., Computer Science, Computer Engineering, Software Engineering, Information Technology, Information Systems)
  • 7-10 years of relevant work experience
  • Hands-on experience automating and improving processes in a software development & production environment
  • Working knowledge of one or more programming languages, such as Python, Go, Javascript, or similar
  • Ability to evaluate and troubleshoot technical issues with an attention to detail in problem-solving
  • Interest in automation and optimization of workflows for improved efficiency
  • Effective verbal and written communication skills
  • Ability to take on new challenges, with a willingness to receive both general and detailed instructions
  • Flexibility to adapt to evolving project requirements and timelines
  • This position requires a flexible schedule, which may include early mornings, late nights, and/or weekends to meet business needs
Job Responsibility
Job Responsibility
  • Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
  • Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
  • Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
  • Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
  • Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
  • Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues
  • Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
  • Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
  • Participate in on-call rotations and handle critical incidents with confidence and expertise
  • Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team
Read More
Arrow Right

Site Reliability Engineer 3

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Ch...
Location
Location
United States , Chicago; Englewood
Salary
Salary:
99684.63 - 156647.28 USD / Year
comcastadvertising.com Logo
Comcast Advertising
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience as an SRE, DevOps or Operations Engineer
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
  • Hands-on experience with Terraform and infrastructure as code principle is a huge plus
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
  • Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
  • System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
  • Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
  • Proactive learner eager to grow in operations and governance
  • Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field
Job Responsibility
Job Responsibility
  • System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
  • Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
  • Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
  • Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
  • Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
  • Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
  • Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
  • Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues
What we offer
What we offer
  • Medical, prescription, vision, and dental insurance for eligible employees
  • 401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
  • Paid time off including eight observed company holidays and flex time
  • Exclusive perks + discounts, including tuition assistance, commuter benefits and more!
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

We are currently seeking a Site Reliability Engineer to join our team in Guadala...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Perform L1.5 activities such as monitoring, deployment, rollback
  • Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
  • Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)
  • Understand the Microsoft Azure Cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree
  • Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), CosmosDB, SQL Server, IoT Hub, Databricks, KeyVault, Datalake
  • Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting
  • Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, Github, Ansible
  • Understand the concept of container orchestration platforms (e.g. Kubernetes)
  • Understand the concept of scripts: Powershell, Python
  • Understand the difference between NoSQL and SQL databases, and how to maintain them
Job Responsibility
Job Responsibility
  • Perform L1.5 activities such as monitoring, deployment, rollback
  • Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
  • Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...
Location
Location
United States , Redmond
Salary
Salary:
102100.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements
  • Candidates must have an active TS and be willing and eligible to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing and eligible to upgrade to TS/SCI (with polygraph)
  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • 2+ years technical experience working with large-scale cloud or distributed systems
  • Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
  • Proficiency in one or more programming languages such as C#, Go, Java, or Python, with the ability to develop and maintain production-quality code
  • Experience with automation that results in measurable improvements (e.g., reduced toil, fewer manual steps, improved system reliability)
  • Experience with debugging and troubleshooting complex distributed systems in production environments
  • Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency
Job Responsibility
Job Responsibility
  • Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
  • Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
  • Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
  • Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
  • Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
  • Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
  • Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Microsoft is a company where passionate innovators come to collaborate, envision...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Must pass Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own the end-to-end readiness of Event Stream across Azure regions, including onboarding new regions, driving deployment automation, and ensuring consistent, secure, and compliant service rollout
  • Work closely with platform, infrastructure, and partner teams (e.g., Event Hubs, Kusto, Fabric platform) to deliver resilient, low-latency streaming experiences on a global scale
  • Play a key role in advancing our reliability posture, improving availability, monitoring, and incident response across regions
  • Build strong observability, telemetry, and automated recovery mechanisms to meet high availability and SLA targets
  • Region Build-out & Deployment: Onboard new regions, drive deployment automation, and ensure consistent service configuration
  • Reliability & SRE: Improve availability, resiliency, and incident response
  • own service health across regions
  • Observability & Operations: Enhance telemetry, monitoring, alerting, and troubleshooting capabilities
  • Cross-team Collaboration: Partner with platform and infra teams to unblock dependencies and ensure smooth rollout
  • Production Excellence: Drive root-cause analysis, repair items, and continuous improvement on service reliability
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Location
Location
United States , Exton
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
  • 3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems
Job Responsibility
Job Responsibility
  • Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
  • Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
  • Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
  • Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
  • Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
  • Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support
Read More
Arrow Right

Principal Site Reliability Engineer

Substrate powers Microsoft 365. Keeping it up, resilient, and continuously impro...
Location
Location
United States , Multiple Locations
Salary
Salary:
142800.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration. OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration. OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration. OR equivalent experience.
Job Responsibility
Job Responsibility
  • Incident management excellence: Lead high-severity incident response, debug complex issues, drive incidents to resolution with clear communication and ownership. Ensure high-quality postmortems reports are created and enforce repair-item SLAs
  • Improve observability: Enhance telemetry, alerting, and dashboards using One Microsoft tooling to provide actionable insights and reduce detection time
  • Define and measure reliability: Partner with engineering teams to establish and track SLIs/SLOs for critical scenarios
  • Live site health reviews: Lead and facilitate live site health review meetings, translating business requirements into metrics and action
  • Engineering for prevention: Translate learnings into proactive tests, product fixes, rollout guardrails, and automation that reduce risk and improve service health
  • Reliability drills: Design and execute drills to simulate product failures, validate resilience and recovery, and develop resilience strategies
  • Define Policy: Draft process and policy documentation for how the organization prepares for, responds to, and prevents incidents
  • Fulltime
Read More
Arrow Right

Middle Site Reliability Engineer (CDN & DevOps)

Provectus is looking for a Senior DevOps or SRE professional to join our team an...
Location
Location
Salary
Salary:
Not provided
provectus.com Logo
Provectus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3 plus years in a DevOps, SRE, or Web Operations role
  • Hands on experience with a public CDN like Fastly, Varnish, Cloudflare, or Akamai including writing cache and routing rules
  • Strong understanding of HTTP fundamentals such as cache headers, redirects, surrogate keys, and purge strategies
  • Experience with GitLab CI CD pipelines and solid Linux administration skills
  • Scripting proficiency in Python or Bash and comfort using AI assisted coding tools like Copilot or Claude
  • Upper Intermediate English with strong communication skills for a distributed team
Job Responsibility
Job Responsibility
  • Manage and improve CDN configuration including caching rules, redirects, and traffic routing across a globally distributed edge
  • Triage CDN related production issues through log analysis and performance investigations for high traffic events
  • Review merge requests from product and platform engineering teams and advise on cache behavior and edge performance
  • Build and maintain CI CD pipelines in GitLab CI for safely delivering CDN configuration changes
  • Partner with development teams across US and EU time zones to onboard new services behind the CDN
  • Maintain documentation, runbooks, and operational procedures while contributing to monitoring and alerting on edge traffic
What we offer
What we offer
  • Opportunity to work with cutting-edge AI and cloud solutions
  • Internal training programs (Leadership, Public Speaking, and more) with full support for AWS and other professional certifications
  • Career growth: a clear path toward SA or beyond
  • we actively develop our engineers
  • Access to the latest AI tools and premium subscriptions
  • Long-term B2B collaboration
  • Remote with flexible hours
  • Private medical insurance or a budget for your medical needs
  • Paid sick leave, vacation, and public holidays
  • Equipment and all the tech you need for comfortable, productive work
Read More
Arrow Right