Site Reliability Engineer 3 Job at Comcast Advertising (Chicago)

Site Reliability Engineer 3

We are seeking a skilled and motivated Site Reliability Engineer to join our tea...

Location

India , Chennai

Salary:

Not provided

Trimble Inc.

Expiration Date

Until further notice

Requirements

Bachelor's degree in a relevant field of study (e.g., Computer Science, Computer Engineering, Software Engineering, Information Technology, Information Systems)
7-10 years of relevant work experience
Hands-on experience automating and improving processes in a software development & production environment
Working knowledge of one or more programming languages, such as Python, Go, Javascript, or similar
Ability to evaluate and troubleshoot technical issues with an attention to detail in problem-solving
Interest in automation and optimization of workflows for improved efficiency
Effective verbal and written communication skills
Ability to take on new challenges, with a willingness to receive both general and detailed instructions
Flexibility to adapt to evolving project requirements and timelines
This position requires a flexible schedule, which may include early mornings, late nights, and/or weekends to meet business needs

Job Responsibility

Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues
Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
Participate in on-call rotations and handle critical incidents with confidence and expertise
Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team

Site Reliability Engineer 3

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Ch...

Location

United States , Chicago; Denver

Salary:

99684.63 - 149526.95 USD / Year

Comcast Advertising

Expiration Date

Until further notice

Requirements

3+ years of experience as an SRE, DevOps or Operations Engineer
Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
Hands-on experience with Terraform and infrastructure as code principle is a huge plus
Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
Proactive learner eager to grow in operations and governance
Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field

Job Responsibility

System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues

What we offer

Medical, prescription, vision, and dental insurance for eligible employees
401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
Paid time off including eight observed company holidays and flex time
Exclusive perks + discounts, including tuition assistance, commuter benefits and more

Fulltime

Site Reliability Engineer II

Location

United States , Exton

Salary:

Not provided

Bentley Systems

Expiration Date

Until further notice

Requirements

U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems

Job Responsibility

Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support

New

Sr. AI Site Reliability Engineer, AI.x

At Schwab, you will build a rewarding career while making a difference in the li...

Location

United States , Austin

Salary:

175000.00 - 220000.00 USD / Year

Charles Schwab

Expiration Date

June 16, 2026

Requirements

8+ years of software engineering experience, with 4+ years as a hands-on Site Reliability Engineer in startups and/or large organizations
Bachelor's degree in Computer Science or related field, or equivalent experience
5+ years building complex products from scratch, running them in production, and ensuring operational reliability
3+ years working with containers and cloud-native applications, operationalizing them in the public cloud with infrastructure as code and CI/CD pipelines
3+ years of experience working in high-availability hybrid-cloud environments

Job Responsibility

Lead automation-first initiatives to eliminate toil and manual interventions, defining and executing the strategic roadmap for reliability, observability, and self-healing systems across AI.x platforms
Design and implement robust CI/CD pipelines enabling one-touch deployments with automated testing, validation, and rollback capabilities to accelerate delivery velocity and reduce deployment risk
Implement comprehensive observability frameworks for real-time monitoring of AI services, including metrics, logs, and traces, with intelligent alerting and automated diagnostics to minimize MTTD and MTTR
Participate in on-call rotation providing 24/7 support for production AI systems, ensuring rapid incident response, root cause analysis, and resolution with measurable SLO targets
Establish and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, and incident response runbooks to drive continuous reliability improvements
Champion Infrastructure-as-Code (IaC) practices and automate environment provisioning, configuration management, and deployment processes to ensure consistency, repeatability, and operational efficiency
Collaborate seamlessly with AI Engineering teams to integrate SRE practices early in the development lifecycle, promoting a culture of reliability and shared responsibility
Proactively identify and resolve reliability, performance, and scalability issues through data-driven analysis, capacity planning, and system optimization
Implement and maintain monitoring, alerting, and incident response frameworks to ensure system health and reliability, maximizing production availability
Champion reliability, monitoring, observability, and operational best practices for AI systems and data pipelines, establishing patterns and standards for the organization

What we offer

401(k) with company match and Employee stock purchase plan
Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
Paid parental leave and family building benefits
Tuition reimbursement
Health, dental, and vision insurance

Fulltime

!

Principal Site Reliability Engineer

Substrate powers Microsoft 365. Keeping it up, resilient, and continuously impro...

Location

United States , Multiple Locations

Salary:

142800.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration. OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration. OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration. OR equivalent experience.

Job Responsibility

Incident management excellence: Lead high-severity incident response, debug complex issues, drive incidents to resolution with clear communication and ownership. Ensure high-quality postmortems reports are created and enforce repair-item SLAs
Improve observability: Enhance telemetry, alerting, and dashboards using One Microsoft tooling to provide actionable insights and reduce detection time
Define and measure reliability: Partner with engineering teams to establish and track SLIs/SLOs for critical scenarios
Live site health reviews: Lead and facilitate live site health review meetings, translating business requirements into metrics and action
Engineering for prevention: Translate learnings into proactive tests, product fixes, rollout guardrails, and automation that reduce risk and improve service health
Reliability drills: Design and execute drills to simulate product failures, validate resilience and recovery, and develop resilience strategies
Define Policy: Draft process and policy documentation for how the organization prepares for, responds to, and prevents incidents

Fulltime

Middle Site Reliability Engineer (CDN & DevOps)

Provectus is looking for a Senior DevOps or SRE professional to join our team an...

Location

Salary:

Not provided

Provectus

Expiration Date

Until further notice

Requirements

3 plus years in a DevOps, SRE, or Web Operations role
Hands on experience with a public CDN like Fastly, Varnish, Cloudflare, or Akamai including writing cache and routing rules
Strong understanding of HTTP fundamentals such as cache headers, redirects, surrogate keys, and purge strategies
Experience with GitLab CI CD pipelines and solid Linux administration skills
Scripting proficiency in Python or Bash and comfort using AI assisted coding tools like Copilot or Claude
Upper Intermediate English with strong communication skills for a distributed team

Job Responsibility

Manage and improve CDN configuration including caching rules, redirects, and traffic routing across a globally distributed edge
Triage CDN related production issues through log analysis and performance investigations for high traffic events
Review merge requests from product and platform engineering teams and advise on cache behavior and edge performance
Build and maintain CI CD pipelines in GitLab CI for safely delivering CDN configuration changes
Partner with development teams across US and EU time zones to onboard new services behind the CDN
Maintain documentation, runbooks, and operational procedures while contributing to monitoring and alerting on edge traffic

What we offer

Opportunity to work with cutting-edge AI and cloud solutions
Internal training programs (Leadership, Public Speaking, and more) with full support for AWS and other professional certifications
Career growth: a clear path toward SA or beyond
we actively develop our engineers
Access to the latest AI tools and premium subscriptions
Long-term B2B collaboration
Remote with flexible hours
Private medical insurance or a budget for your medical needs
Paid sick leave, vacation, and public holidays
Equipment and all the tech you need for comfortable, productive work

Middle Site Reliability Engineer (CDN & DevOps)

Provectus is looking for a Senior DevOps or SRE professional to join our team an...

Location

Serbia; Spain; Poland; Armenia; North Macedonia , Serbia; Spain; Poland; Yerevan, Armenia; Skopje

Salary:

Not provided

Provectus

Expiration Date

Until further notice

Requirements

3 plus years in a DevOps, SRE, or Web Operations role
Hands on experience with a public CDN like Fastly, Varnish, Cloudflare, or Akamai including writing cache and routing rules
Strong understanding of HTTP fundamentals such as cache headers, redirects, surrogate keys, and purge strategies
Experience with GitLab CI CD pipelines and solid Linux administration skills
Scripting proficiency in Python or Bash and comfort using AI assisted coding tools like Copilot or Claude
Upper Intermediate English with strong communication skills for a distributed team

Job Responsibility

Manage and improve CDN configuration including caching rules, redirects, and traffic routing across a globally distributed edge
Triage CDN related production issues through log analysis and performance investigations for high traffic events
Review merge requests from product and platform engineering teams and advise on cache behavior and edge performance
Build and maintain CI CD pipelines in GitLab CI for safely delivering CDN configuration changes
Partner with development teams across US and EU time zones to onboard new services behind the CDN
Maintain documentation, runbooks, and operational procedures while contributing to monitoring and alerting on edge traffic

What we offer

Opportunity to work with cutting-edge AI and cloud solutions
Internal training programs (Leadership, Public Speaking, and more) with full support for AWS and other professional certifications
Career growth: a clear path toward SA or beyond
we actively develop our engineers
Access to the latest AI tools and premium subscriptions
Long-term B2B collaboration
Remote with flexible hours
Private medical insurance or a budget for your medical needs
Paid sick leave, vacation, and public holidays
Equipment and all the tech you need for comfortable, productive work

Fulltime

Sr Site Reliability Engineer, Secure Federal Operations

This role is responsible for designing and implementing secure, scalable, and hi...

Location

United States , Herndon

Salary:

107300.00 - 193500.00 USD / Year

T-Mobile

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, Information Technology, or related field plus 3 years of related work experience. Or, advanced degree with 1 year of related experience. Or, combination of education and experience deemed equivalent
4+ years of progressive experience in systems architecture, platform engineering, or site reliability engineering
Hands-on experience with Azure and AWS cloud platforms
Expertise in Active Directory, DNS, 802.1X, and certificate lifecycle management
Strong background in Windows and Linux operating systems
Proficiency in TCP/IP networking and network security principles
Administration of Microsoft 365 (M365) services (Exchange Online, SharePoint, Teams)
US citizenship (without dual citizenship)
At least 18 years of age and legally authorized to work in the United States
Active security clearance or ability to obtain one

Job Responsibility

Develop and implement system designs to improve software delivery speed and operational efficiency
Lead architecture for cross-domain programs ensuring alignment with enterprise standards
Deliver solutions that enhance service availability, scalability, latency, and efficiency
Design and deploy solutions on Azure and AWS
Build and operate cloud-native platforms (Kubernetes, service mesh, ingress, policy engines)
Implement Infrastructure as Code (IaC) for automated deployments
Administer Active Directory and integrate with cloud identity solutions
Configure 802.1X authentication for secure network access
Manage digital certificates lifecycle (issuance, renewal, revocation)
Manage DNS, TCP/IP networks, and network segmentation

What we offer

competitive base salary
annual stock grant
employee stock purchase plan
401(k)
free year-round money coaches
medical insurance
dental insurance
vision insurance
flexible spending account
paid time off

Fulltime

Select Country

Site Reliability Engineer 3

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Site Reliability Engineer 3

Site Reliability Engineer 3

Site Reliability Engineer 3

Site Reliability Engineer II

Sr. AI Site Reliability Engineer, AI.x

Principal Site Reliability Engineer

Middle Site Reliability Engineer (CDN & DevOps)

Middle Site Reliability Engineer (CDN & DevOps)

Sr Site Reliability Engineer, Secure Federal Operations

Our AI answers in your language