CrawlJobs Logo

Site Reliability Engineer 3

United States, Chicago 99684.63 - 156647.28 USD / Year · Job Posted April 24, 2026
Apply Position
Job Link Share

Job Description

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Chicago, IL. As a member of the Global Operation team, you will be responsible for ensuring the reliability, scalability, and performance of Freewheel systems. Working closely with engineers and other operation sub-teams, you will manage infrastructure, optimize system reliability, automate daily operations, and resolve technical issues that impact upstream/downstream platform.

Job Responsibility

  • System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
  • Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
  • Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
  • Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
  • Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
  • Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
  • Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
  • Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues

Requirements

  • 3+ years of experience as an SRE, DevOps or Operations Engineer
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
  • Hands-on experience with Terraform and infrastructure as code principle is a huge plus
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
  • Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
  • System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
  • Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
  • Proactive learner eager to grow in operations and governance
  • Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field

What we offer

  • Medical, prescription, vision, and dental insurance for eligible employees
  • 401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
  • Paid time off including eight observed company holidays and flex time
  • Exclusive perks + discounts, including tuition assistance, commuter benefits and more!

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer 3

8 matching positions

Site Reliability Engineer 3

We are seeking a skilled and motivated Site Reliability Engineer to join our tea...
Location
Location
India , Chennai
Salary
Salary:
Not provided
trimble.com Logo
Trimble Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in a relevant field of study (e.g., Computer Science, Computer Engineering, Software Engineering, Information Technology, Information Systems)
  • 7-10 years of relevant work experience
  • Hands-on experience automating and improving processes in a software development & production environment
  • Working knowledge of one or more programming languages, such as Python, Go, Javascript, or similar
  • Ability to evaluate and troubleshoot technical issues with an attention to detail in problem-solving
  • Interest in automation and optimization of workflows for improved efficiency
  • Effective verbal and written communication skills
  • Ability to take on new challenges, with a willingness to receive both general and detailed instructions
  • Flexibility to adapt to evolving project requirements and timelines
  • This position requires a flexible schedule, which may include early mornings, late nights, and/or weekends to meet business needs
Job Responsibility
Job Responsibility
  • Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
  • Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
  • Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
  • Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
  • Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
  • Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues
  • Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
  • Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
  • Participate in on-call rotations and handle critical incidents with confidence and expertise
  • Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team
Read More
Arrow Right

Site Reliability Engineer 3

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Ch...
Location
Location
United States , Chicago; Denver
Salary
Salary:
99684.63 - 149526.95 USD / Year
comcastadvertising.com Logo
Comcast Advertising
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience as an SRE, DevOps or Operations Engineer
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
  • Hands-on experience with Terraform and infrastructure as code principle is a huge plus
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
  • Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
  • System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
  • Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
  • Proactive learner eager to grow in operations and governance
  • Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field
Job Responsibility
Job Responsibility
  • System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
  • Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
  • Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
  • Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
  • Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
  • Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
  • Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
  • Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues
What we offer
What we offer
  • Medical, prescription, vision, and dental insurance for eligible employees
  • 401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
  • Paid time off including eight observed company holidays and flex time
  • Exclusive perks + discounts, including tuition assistance, commuter benefits and more
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Location
Location
United States , Exton
Salary
Salary:
Not provided
bentley.com Logo
Bentley Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
  • 3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems
Job Responsibility
Job Responsibility
  • Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
  • Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
  • Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
  • Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
  • Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
  • Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support
Read More
Arrow Right
New

Sr. AI Site Reliability Engineer, AI.x

At Schwab, you will build a rewarding career while making a difference in the li...
Location
Location
United States , Austin
Salary
Salary:
175000.00 - 220000.00 USD / Year
schwab.com Logo
Charles Schwab
Expiration Date
June 16, 2026
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, with 4+ years as a hands-on Site Reliability Engineer in startups and/or large organizations
  • Bachelor's degree in Computer Science or related field, or equivalent experience
  • 5+ years building complex products from scratch, running them in production, and ensuring operational reliability
  • 3+ years working with containers and cloud-native applications, operationalizing them in the public cloud with infrastructure as code and CI/CD pipelines
  • 3+ years of experience working in high-availability hybrid-cloud environments
Job Responsibility
Job Responsibility
  • Lead automation-first initiatives to eliminate toil and manual interventions, defining and executing the strategic roadmap for reliability, observability, and self-healing systems across AI.x platforms
  • Design and implement robust CI/CD pipelines enabling one-touch deployments with automated testing, validation, and rollback capabilities to accelerate delivery velocity and reduce deployment risk
  • Implement comprehensive observability frameworks for real-time monitoring of AI services, including metrics, logs, and traces, with intelligent alerting and automated diagnostics to minimize MTTD and MTTR
  • Participate in on-call rotation providing 24/7 support for production AI systems, ensuring rapid incident response, root cause analysis, and resolution with measurable SLO targets
  • Establish and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, and incident response runbooks to drive continuous reliability improvements
  • Champion Infrastructure-as-Code (IaC) practices and automate environment provisioning, configuration management, and deployment processes to ensure consistency, repeatability, and operational efficiency
  • Collaborate seamlessly with AI Engineering teams to integrate SRE practices early in the development lifecycle, promoting a culture of reliability and shared responsibility
  • Proactively identify and resolve reliability, performance, and scalability issues through data-driven analysis, capacity planning, and system optimization
  • Implement and maintain monitoring, alerting, and incident response frameworks to ensure system health and reliability, maximizing production availability
  • Champion reliability, monitoring, observability, and operational best practices for AI systems and data pipelines, establishing patterns and standards for the organization
What we offer
What we offer
  • 401(k) with company match and Employee stock purchase plan
  • Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
  • Paid parental leave and family building benefits
  • Tuition reimbursement
  • Health, dental, and vision insurance
  • Fulltime
!
Read More
Arrow Right

Principal Site Reliability Engineer

Substrate powers Microsoft 365. Keeping it up, resilient, and continuously impro...
Location
Location
United States , Multiple Locations
Salary
Salary:
142800.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration. OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration. OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration. OR equivalent experience.
Job Responsibility
Job Responsibility
  • Incident management excellence: Lead high-severity incident response, debug complex issues, drive incidents to resolution with clear communication and ownership. Ensure high-quality postmortems reports are created and enforce repair-item SLAs
  • Improve observability: Enhance telemetry, alerting, and dashboards using One Microsoft tooling to provide actionable insights and reduce detection time
  • Define and measure reliability: Partner with engineering teams to establish and track SLIs/SLOs for critical scenarios
  • Live site health reviews: Lead and facilitate live site health review meetings, translating business requirements into metrics and action
  • Engineering for prevention: Translate learnings into proactive tests, product fixes, rollout guardrails, and automation that reduce risk and improve service health
  • Reliability drills: Design and execute drills to simulate product failures, validate resilience and recovery, and develop resilience strategies
  • Define Policy: Draft process and policy documentation for how the organization prepares for, responds to, and prevents incidents
  • Fulltime
Read More
Arrow Right

Middle Site Reliability Engineer (CDN & DevOps)

Provectus is looking for a Senior DevOps or SRE professional to join our team an...
Location
Location
Salary
Salary:
Not provided
provectus.com Logo
Provectus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3 plus years in a DevOps, SRE, or Web Operations role
  • Hands on experience with a public CDN like Fastly, Varnish, Cloudflare, or Akamai including writing cache and routing rules
  • Strong understanding of HTTP fundamentals such as cache headers, redirects, surrogate keys, and purge strategies
  • Experience with GitLab CI CD pipelines and solid Linux administration skills
  • Scripting proficiency in Python or Bash and comfort using AI assisted coding tools like Copilot or Claude
  • Upper Intermediate English with strong communication skills for a distributed team
Job Responsibility
Job Responsibility
  • Manage and improve CDN configuration including caching rules, redirects, and traffic routing across a globally distributed edge
  • Triage CDN related production issues through log analysis and performance investigations for high traffic events
  • Review merge requests from product and platform engineering teams and advise on cache behavior and edge performance
  • Build and maintain CI CD pipelines in GitLab CI for safely delivering CDN configuration changes
  • Partner with development teams across US and EU time zones to onboard new services behind the CDN
  • Maintain documentation, runbooks, and operational procedures while contributing to monitoring and alerting on edge traffic
What we offer
What we offer
  • Opportunity to work with cutting-edge AI and cloud solutions
  • Internal training programs (Leadership, Public Speaking, and more) with full support for AWS and other professional certifications
  • Career growth: a clear path toward SA or beyond
  • we actively develop our engineers
  • Access to the latest AI tools and premium subscriptions
  • Long-term B2B collaboration
  • Remote with flexible hours
  • Private medical insurance or a budget for your medical needs
  • Paid sick leave, vacation, and public holidays
  • Equipment and all the tech you need for comfortable, productive work
Read More
Arrow Right

Middle Site Reliability Engineer (CDN & DevOps)

Provectus is looking for a Senior DevOps or SRE professional to join our team an...
Location
Location
Serbia; Spain; Poland; Armenia; North Macedonia , Serbia; Spain; Poland; Yerevan, Armenia; Skopje
Salary
Salary:
Not provided
provectus.com Logo
Provectus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3 plus years in a DevOps, SRE, or Web Operations role
  • Hands on experience with a public CDN like Fastly, Varnish, Cloudflare, or Akamai including writing cache and routing rules
  • Strong understanding of HTTP fundamentals such as cache headers, redirects, surrogate keys, and purge strategies
  • Experience with GitLab CI CD pipelines and solid Linux administration skills
  • Scripting proficiency in Python or Bash and comfort using AI assisted coding tools like Copilot or Claude
  • Upper Intermediate English with strong communication skills for a distributed team
Job Responsibility
Job Responsibility
  • Manage and improve CDN configuration including caching rules, redirects, and traffic routing across a globally distributed edge
  • Triage CDN related production issues through log analysis and performance investigations for high traffic events
  • Review merge requests from product and platform engineering teams and advise on cache behavior and edge performance
  • Build and maintain CI CD pipelines in GitLab CI for safely delivering CDN configuration changes
  • Partner with development teams across US and EU time zones to onboard new services behind the CDN
  • Maintain documentation, runbooks, and operational procedures while contributing to monitoring and alerting on edge traffic
What we offer
What we offer
  • Opportunity to work with cutting-edge AI and cloud solutions
  • Internal training programs (Leadership, Public Speaking, and more) with full support for AWS and other professional certifications
  • Career growth: a clear path toward SA or beyond
  • we actively develop our engineers
  • Access to the latest AI tools and premium subscriptions
  • Long-term B2B collaboration
  • Remote with flexible hours
  • Private medical insurance or a budget for your medical needs
  • Paid sick leave, vacation, and public holidays
  • Equipment and all the tech you need for comfortable, productive work
  • Fulltime
Read More
Arrow Right

Sr Site Reliability Engineer, Secure Federal Operations

This role is responsible for designing and implementing secure, scalable, and hi...
Location
Location
United States , Herndon
Salary
Salary:
107300.00 - 193500.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, Information Technology, or related field plus 3 years of related work experience. Or, advanced degree with 1 year of related experience. Or, combination of education and experience deemed equivalent
  • 4+ years of progressive experience in systems architecture, platform engineering, or site reliability engineering
  • Hands-on experience with Azure and AWS cloud platforms
  • Expertise in Active Directory, DNS, 802.1X, and certificate lifecycle management
  • Strong background in Windows and Linux operating systems
  • Proficiency in TCP/IP networking and network security principles
  • Administration of Microsoft 365 (M365) services (Exchange Online, SharePoint, Teams)
  • US citizenship (without dual citizenship)
  • At least 18 years of age and legally authorized to work in the United States
  • Active security clearance or ability to obtain one
Job Responsibility
Job Responsibility
  • Develop and implement system designs to improve software delivery speed and operational efficiency
  • Lead architecture for cross-domain programs ensuring alignment with enterprise standards
  • Deliver solutions that enhance service availability, scalability, latency, and efficiency
  • Design and deploy solutions on Azure and AWS
  • Build and operate cloud-native platforms (Kubernetes, service mesh, ingress, policy engines)
  • Implement Infrastructure as Code (IaC) for automated deployments
  • Administer Active Directory and integrate with cloud identity solutions
  • Configure 802.1X authentication for secure network access
  • Manage digital certificates lifecycle (issuance, renewal, revocation)
  • Manage DNS, TCP/IP networks, and network segmentation
What we offer
What we offer
  • competitive base salary
  • annual stock grant
  • employee stock purchase plan
  • 401(k)
  • free year-round money coaches
  • medical insurance
  • dental insurance
  • vision insurance
  • flexible spending account
  • paid time off
  • Fulltime
Read More
Arrow Right