Site Reliability Engineer 3 Job at Comcast Advertising (Chicago)

Site Reliability Engineer 3

We are seeking a skilled and motivated Site Reliability Engineer to join our tea...

Location

India , Chennai

Salary:

Not provided

Trimble Inc.

Expiration Date

Until further notice

Requirements

Bachelor's degree in a relevant field of study (e.g., Computer Science, Computer Engineering, Software Engineering, Information Technology, Information Systems)
7-10 years of relevant work experience
Hands-on experience automating and improving processes in a software development & production environment
Working knowledge of one or more programming languages, such as Python, Go, Javascript, or similar
Ability to evaluate and troubleshoot technical issues with an attention to detail in problem-solving
Interest in automation and optimization of workflows for improved efficiency
Effective verbal and written communication skills
Ability to take on new challenges, with a willingness to receive both general and detailed instructions
Flexibility to adapt to evolving project requirements and timelines
This position requires a flexible schedule, which may include early mornings, late nights, and/or weekends to meet business needs

Job Responsibility

Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments
Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting
Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes
Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention
Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed
Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues
Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks
Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles
Participate in on-call rotations and handle critical incidents with confidence and expertise
Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team

Site Reliability Engineer 3

FreeWheel is seeking an SRE to join Freewheel OPS team based in Denver, CO or Ch...

Location

United States , Chicago; Englewood

Salary:

99684.63 - 156647.28 USD / Year

Comcast Advertising

Expiration Date

Until further notice

Requirements

3+ years of experience as an SRE, DevOps or Operations Engineer
Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus
Hands-on experience with Terraform and infrastructure as code principle is a huge plus
Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
Programming Skills: Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools
System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders
Proactive learner eager to grow in operations and governance
Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field

Job Responsibility

System Monitoring and Optimization: Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues
Automation and Tool Development: Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
Performance Optimization: Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed
Incident Response and Troubleshooting: Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability
Capacity Planning and Scaling: Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events
Documentation and Knowledge Sharing: Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training
Security and Compliance: Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse
Cross-Team Collaboration: Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues

What we offer

Medical, prescription, vision, and dental insurance for eligible employees
401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
Paid time off including eight observed company holidays and flex time
Exclusive perks + discounts, including tuition assistance, commuter benefits and more!

Fulltime

New

Site Reliability Engineer

We are currently seeking a Site Reliability Engineer to join our team in Guadala...

Location

Mexico , Guadalajara

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Perform L1.5 activities such as monitoring, deployment, rollback
Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)
Understand the Microsoft Azure Cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree
Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), CosmosDB, SQL Server, IoT Hub, Databricks, KeyVault, Datalake
Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting
Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, Github, Ansible
Understand the concept of container orchestration platforms (e.g. Kubernetes)
Understand the concept of scripts: Powershell, Python
Understand the difference between NoSQL and SQL databases, and how to maintain them

Job Responsibility

Perform L1.5 activities such as monitoring, deployment, rollback
Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)

Fulltime

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...

Location

United States , Redmond

Salary:

102100.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements
Candidates must have an active TS and be willing and eligible to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing and eligible to upgrade to TS/SCI (with polygraph)
Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
2+ years technical experience working with large-scale cloud or distributed systems
Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
Proficiency in one or more programming languages such as C#, Go, Java, or Python, with the ability to develop and maintain production-quality code
Experience with automation that results in measurable improvements (e.g., reduced toil, fewer manual steps, improved system reliability)
Experience with debugging and troubleshooting complex distributed systems in production environments
Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency

Job Responsibility

Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems

Fulltime

Site Reliability Engineer

Microsoft is a company where passionate innovators come to collaborate, envision...

Location

India , Bangalore

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 3+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Must pass Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Own the end-to-end readiness of Event Stream across Azure regions, including onboarding new regions, driving deployment automation, and ensuring consistent, secure, and compliant service rollout
Work closely with platform, infrastructure, and partner teams (e.g., Event Hubs, Kusto, Fabric platform) to deliver resilient, low-latency streaming experiences on a global scale
Play a key role in advancing our reliability posture, improving availability, monitoring, and incident response across regions
Build strong observability, telemetry, and automated recovery mechanisms to meet high availability and SLA targets
Region Build-out & Deployment: Onboard new regions, drive deployment automation, and ensure consistent service configuration
Reliability & SRE: Improve availability, resiliency, and incident response
own service health across regions
Observability & Operations: Enhance telemetry, monitoring, alerting, and troubleshooting capabilities
Cross-team Collaboration: Partner with platform and infra teams to unblock dependencies and ensure smooth rollout
Production Excellence: Drive root-cause analysis, repair items, and continuous improvement on service reliability

Fulltime

Site Reliability Engineer II

Location

United States , Exton

Salary:

Not provided

Bentley Systems

Expiration Date

Until further notice

Requirements

U.S. Master of Science degree, or foreign equivalent in Information Quality,Computer and Information Science, or a closely related field, and 3 years of DevOps Engineering experience
3 years’ experience with Site Reliability Engineering and DevOps automation including designing, implementing and maintaining CI/CD pipelines for cloud-based production systems

Job Responsibility

Responsible for designing, implementing, and maintaining automated cloud infrastructure and CI/CD pipelines to support enterprise software applications
Perform DevOps automation, Infrastructure as Code, and containerized deployments to improve system reliability, scalability, and operational efficiency while reducing manual intervention
Cloud platforms Azure and Amazon Web Services (AWS), including infrastructure provisioning, networking architecture, identity management and security configuration
Developing and maintaining IaC using Terraform, along with automation and scripting using Python or PowerShell, and configuration management using Ansible to support scalable and reliable cloud environments
Containerization and orchestration technologies, including Docker, Kubernetes and Helm for deploying, scaling, and managing distributed cloud-native applications
Build and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana) and participate in a rotating on-call schedule for production support

Principal Site Reliability Engineer

Substrate powers Microsoft 365. Keeping it up, resilient, and continuously impro...

Location

United States , Multiple Locations

Salary:

142800.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration. OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration. OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration. OR equivalent experience.

Job Responsibility

Incident management excellence: Lead high-severity incident response, debug complex issues, drive incidents to resolution with clear communication and ownership. Ensure high-quality postmortems reports are created and enforce repair-item SLAs
Improve observability: Enhance telemetry, alerting, and dashboards using One Microsoft tooling to provide actionable insights and reduce detection time
Define and measure reliability: Partner with engineering teams to establish and track SLIs/SLOs for critical scenarios
Live site health reviews: Lead and facilitate live site health review meetings, translating business requirements into metrics and action
Engineering for prevention: Translate learnings into proactive tests, product fixes, rollout guardrails, and automation that reduce risk and improve service health
Reliability drills: Design and execute drills to simulate product failures, validate resilience and recovery, and develop resilience strategies
Define Policy: Draft process and policy documentation for how the organization prepares for, responds to, and prevents incidents

Fulltime

Middle Site Reliability Engineer (CDN & DevOps)

Provectus is looking for a Senior DevOps or SRE professional to join our team an...

Location

Salary:

Not provided

Provectus

Expiration Date

Until further notice

Requirements

3 plus years in a DevOps, SRE, or Web Operations role
Hands on experience with a public CDN like Fastly, Varnish, Cloudflare, or Akamai including writing cache and routing rules
Strong understanding of HTTP fundamentals such as cache headers, redirects, surrogate keys, and purge strategies
Experience with GitLab CI CD pipelines and solid Linux administration skills
Scripting proficiency in Python or Bash and comfort using AI assisted coding tools like Copilot or Claude
Upper Intermediate English with strong communication skills for a distributed team

Job Responsibility

Manage and improve CDN configuration including caching rules, redirects, and traffic routing across a globally distributed edge
Triage CDN related production issues through log analysis and performance investigations for high traffic events
Review merge requests from product and platform engineering teams and advise on cache behavior and edge performance
Build and maintain CI CD pipelines in GitLab CI for safely delivering CDN configuration changes
Partner with development teams across US and EU time zones to onboard new services behind the CDN
Maintain documentation, runbooks, and operational procedures while contributing to monitoring and alerting on edge traffic

What we offer

Opportunity to work with cutting-edge AI and cloud solutions
Internal training programs (Leadership, Public Speaking, and more) with full support for AWS and other professional certifications
Career growth: a clear path toward SA or beyond
we actively develop our engineers
Access to the latest AI tools and premium subscriptions
Long-term B2B collaboration
Remote with flexible hours
Private medical insurance or a budget for your medical needs
Paid sick leave, vacation, and public holidays
Equipment and all the tech you need for comfortable, productive work

Select Country

Site Reliability Engineer 3

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Site Reliability Engineer 3

Site Reliability Engineer 3

Site Reliability Engineer 3

Site Reliability Engineer

Site Reliability Engineer II

Site Reliability Engineer

Site Reliability Engineer II

Principal Site Reliability Engineer

Middle Site Reliability Engineer (CDN & DevOps)

Our AI answers in your language