CrawlJobs Logo

Site Reliability Engineer 3

comcastcorporation.com Logo

Comcast

Location Icon

Location:
United States , Chicago

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

99684.63 - 149526.95 USD / Year

Job Description:

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Chicago, IL. As a member of the Global Operation team, you will be responsible for ensuring the reliability, scalability, and performance of Freewheel systems. Working closely with engineers and other operation sub-teams, you will manage infrastructure, optimize system reliability, automate daily operations, and resolve technical issues that impact upstream/downstream platform.

Job Responsibility:

  • System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
  • Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
  • Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
  • Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
  • Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
  • Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
  • Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
  • Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues

Requirements:

  • 3+ years of experience as an SRE, DevOps or Operations Engineer
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
  • Hands-on experience with Terraform and infrastructure as code principle is a huge plus
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
  • Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
  • System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
  • Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
  • Proactive learner eager to grow in operations and governance
  • Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field
What we offer:
  • Medical, prescription, vision, and dental insurance for eligible employees
  • 401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
  • Paid time off including eight observed company holidays and flex time
  • Exclusive perks + discounts, including tuition assistance, commuter benefits and more

Additional Information:

Job Posted:
March 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer 3

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer 2

Join us. At PagerDuty, you'll tackle complex problems, collaborate with kind and...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Experience with Kubernetes and container orchestration
  • Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
  • Experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
Job Responsibility
Job Responsibility
  • Deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
  • Help maintain the overall health of the platform, including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
  • Continuously strive to improve the internal developer experience and the software development lifecycle
  • Stay current on technical trends to suggest innovative tools and approaches to interesting problems
  • Participate in a 24/7 on-call rotation
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We’re looking for a passionate Site Reliability Engineer to pioneer our SRE stra...
Location
Location
Spain , Barcelona
Salary
Salary:
45000.00 - 59000.00 EUR / Year
edpuzzle.com Logo
Edpuzzle
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 3 years of experience in Site Reliability Engineering, DevOps Engineering, System Administration or Cloud Infrastructure Engineering for a web-based product with a focus on observability and reliability
  • Good knowledge of Amazon Web Services (AWS), CloudWatch and Datadog
  • Experience with software release management and deployment pipelines (Git, CI/CD)
  • Experience with Infrastructure as Code using AWS CDK
  • Experience writing JavaScript, TypeScript or Node.js code
  • Pragmatic with technologies: you understand tech is a tool to solve a product problem, tech is never the end goal
  • Excellent ability to communicate your ideas, regardless of the audience
  • Product-oriented: You make all your technology decisions with the final user in mind
  • You are naturally drawn towards understanding the bigger picture and recognize when there's a need for improvement, applying your intentional and rational thought process to address complex issues
  • You are able to work independently, plan and exercise conscious control of time spent on specific goals to reach deadlines effectively, and you don’t hesitate to pursue a goal despite the difficulties, all while maintaining a flexible mindset
Job Responsibility
Job Responsibility
  • Work with the Product, Infrastructure and Engineering teams to find the best technical solutions by participating in discussions and sharing your opinions
  • Take ownership of the problems that are being worked on, understanding why they are needed by the users, carrying out your own research, making your own proposals and working on the implementation while relying on your teammates for help when needed
  • Communicate effectively in a team in order to maximize productivity, ownership, and focus to help projects reach the finish line with the best possible outcome and by the project deadline
  • Design a cloud infrastructure that is secure, scalable, and highly available on AWS
  • Engage in proactive monitoring and observability with comprehensive tools and practices that not only detect and warn, but also predict potential system issues before they affect our users
  • Lead the charge in root cause analysis for production and infrastructure issues, transforming challenges into learning opportunities
  • Provision, configure and maintain cloud infrastructure as code
  • Perform rotatory on-call service, ensuring reliability and uptime for our users
  • Write technical documentation, contributing to our technical knowledge base and empowering your peers
  • Perform other exciting duties as opportunities and needs arise.
What we offer
What we offer
  • On-call compensation
  • 24 days’ paid holidays plus December 24th and 31st
  • Flexible working hours and reduced working time on Fridays to support work-life balance
  • €2000 annual allowance for meals with Cobee
  • Private health insurance policy with AXA
  • Access to Wellhub to support physical and emotional well-being
  • Flexible remuneration for childcare
  • Flexible remuneration for public transport
  • Flexible remuneration for health insurance of immediate family members (spouse and/or children)
  • Training and development (CodelyTV, Cloud Academy, etc.)
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

The Wikimedia Foundation is looking for an Engineering Manager to join our SRE t...
Location
Location
United States of America
Salary
Salary:
132439.00 - 208378.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Prior experience managing teams
  • Prior hands-on experience with software or reliability engineering (within the last 3 years preferred)
  • Ability to analyze complex systems, troubleshoot issues, and devise effective solutions under pressure
  • Proficiency in project management methodologies to effectively plan, execute, and track new and existing initiatives
  • Strong understanding of cloud computing, networking, Linux systems administration, containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible) to be able to provide technical support to the team
  • Aptitude for automation and streamlining of tasks
  • Communicate effectively in both spoken and written English
  • Ability to work independently, as an effective part of a globally distributed team
  • Ability to travel several times a year for occasional in-person meetings
  • B.S. or M.S. in Computer Science or the equivalent in related work experience
Job Responsibility
Job Responsibility
  • Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization
  • Providing guidance, mentorship, and support to ensure the team's effectiveness and growth
  • Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path
  • Recruiting, hiring, and helping onboard new team members
  • Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members
  • Coordinating and communicating with other members of the Wikimedia product & engineering teams on relevant projects, executing complex projects and contributing to the organizational strategy
  • Continuously developing the roadmap of the team in alignment with other SRE and Product & Technology teams, and helping to draft and execute the team’s annual and quarterly plans
  • Project managing new and existing initiatives
  • Leading the definition, refinement, and execution of the processes through which the team manages and performs work
  • Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for an engineer who is passionate about scaling Cloud services to...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production
  • 3+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Great emphasis to debug, improve code, and automate routine tasks
  • Backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
Job Responsibility
Job Responsibility
  • Scaling Cloud services
  • Owning the database infrastructure, tooling and automation that Jira Cloud runs on
  • Analyzing and improving services and processes to achieve higher levels of reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Atlassian is looking for an engineer passionate about scaling Cloud services to ...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • 3+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Great emphasis to debug, improve code, and automate routine tasks
  • Backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
Job Responsibility
Job Responsibility
  • Scaling Cloud services
  • Owning the database infrastructure, tooling and automation that Jira Cloud runs on
  • Analyzing and improving services and processes to achieve higher reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 3+ years of relevant work experience
  • Highly motivated self-starter with good interpersonal and communication skills
  • Certification or formal training in site reliability engineering concepts and practices would be beneficial
  • Prior experience working towards SLIs, SLOs and observability capabilities
  • 2+ years experience in Python alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with Secrets products such as HashiCorp Vault or CyberArk beneficial but not essential
  • Experience with CICD tools such as terraform, Jenkins, Ansible.
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right

Manager, Site Reliability Engineering and Incident Management

Planet DDS is seeking a Manager, Site Reliability Engineering and Incident Manag...
Location
Location
United States , Atlanta
Salary
Salary:
118000.00 - 160000.00 USD / Year
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in SRE, DevOps, or Infrastructure roles
  • 3+ years in Incident Management leadership
  • Deep understanding of reliability, scalability, and performance optimization
  • Multi-cloud expertise in AWS, Azure, or GCP
  • Understanding of DNS, load balancing, firewalls, and compliance frameworks
  • Knowledge of fundamental cloud security (e.g., identity and access management, firewalls)
  • Deep understanding of logging and monitoring and security best practices
  • Strong collaboration and communication skills
  • Bachelor’s Degree in a relevant major or equivalent years of experience is a plus
Job Responsibility
Job Responsibility
  • Lead and mentor a team of SREs and Incident Managers
  • Foster a culture of reliability, accountability, and continuous improvement
  • Collaborate with engineering teams to design resilient platform architectures
  • Oversee the incident response process for outages and service disruptions
  • Ensure timely detection, escalation, and resolution of incidents
  • Drive post-incident reviews (PIRs) and root cause analysis
  • Implement improvements based on lessons learned to prevent recurrence
  • Mature and enforce best practices for incident response and runbooks
  • Automate operational tasks to reduce toil and improve efficiency
  • Maintain observability tools (monitoring, alerting, logging)
  • Fulltime
Read More
Arrow Right