Site Reliability Engineer 2 Job at Microsoft Corporation (Redmond)

Site Reliability Engineer 2

FreeWheel is seeking a Junior DevOps / SRE 2 to join Freewheel OPS team based in...

Location

United States , Chicago; Englewood; Philadelphia

Salary:

84478.50 - 126717.75 USD / Year

Comcast Advertising

Expiration Date

Until further notice

Requirements

1-3 years of experience as an SRE, DevOps or Operations Engineer
Proficient in at least one programming language, such as Python, Go, Java, or Scala
Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker
Familiar with monitoring and log management tools such as Prometheus, Grafana, ELK Stack
Excellent communication skills
Proactive learner eager to grow in operations and governance
Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field

Job Responsibility

Design and implement monitoring and alerting systems
Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery
Analyze and optimize the performance of data storage, query performance, and data flows
Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts
Work with engineering teams to analyze and forecast capacity requirements
Maintain consistent cloud standards and support enforcement of governance and compliance practices
Document the architecture, configurations, and operational procedures
Ensure platforms meet security standards and compliance requirements
Collaborate with engineering team, product team, and project management team

What we offer

Medical, prescription, vision, and dental insurance
401(k) savings plan with dollar-for-dollar matching up to the first 6% of your pay
Paid time off including eight observed company holidays and flex time
Tuition assistance
Commuter benefits

Fulltime

New

Site Reliability Engineer

We are currently seeking a Site Reliability Engineer to join our team in Guadala...

Location

Mexico , Guadalajara

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Perform L1.5 activities such as monitoring, deployment, rollback
Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)
Understand the Microsoft Azure Cloud - ideally Azure Fundamentals certified OR Computer Science/Information Systems Management degree
Familiar with PaaS and IaaS - VMs, Storage, EventHub, Service Fabric Cluster (SFC), Azure Kubernetes Service (AKS), CosmosDB, SQL Server, IoT Hub, Databricks, KeyVault, Datalake
Understand the concept of Internet of Things (IoT) - telemetry, ingestion, processing, data storage, reporting
Understand the concept tools - Octopus, Bamboo, Terraform, Azure DevOps, Jenkins, Github, Ansible
Understand the concept of container orchestration platforms (e.g. Kubernetes)
Understand the concept of scripts: Powershell, Python
Understand the difference between NoSQL and SQL databases, and how to maintain them

Job Responsibility

Perform L1.5 activities such as monitoring, deployment, rollback
Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage
Troubleshoot Azure resources, escalate to Level 3 (Software Development Team)

Fulltime

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...

Location

United States , Redmond

Salary:

102100.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements
Candidates must have an active TS and be willing and eligible to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing and eligible to upgrade to TS/SCI (with polygraph)
Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
2+ years technical experience working with large-scale cloud or distributed systems
Demonstrated experience applying software engineering principles to production systems, including designing, building, or improving services and platforms
Proficiency in one or more programming languages such as C#, Go, Java, or Python, with the ability to develop and maintain production-quality code
Experience with automation that results in measurable improvements (e.g., reduced toil, fewer manual steps, improved system reliability)
Experience with debugging and troubleshooting complex distributed systems in production environments
Ability to independently identify problems and implement solutions that improve system reliability and operational efficiency

Job Responsibility

Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems

Fulltime

Lead Site Reliability Engineer

Trimble is looking for a Site Reliability Engineering Lead to join Business Syst...

Location

India , Chennai

Salary:

Not provided

Trimble Inc.

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Engineering, Computer Science, or a related field
7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles with at least 2+ years in a leadership or mentoring capacity
Deep AWS expertise (EC2, S3, RDS, IAM, VPC, Lambda, CloudFormation/Terraform, etc.)
Strong knowledge of Infrastructure-as-Code (IaC) using Terraform, AWS CDK, or CloudFormation
Proven experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, or similar)
Proficiency in containerization and orchestration (Docker, Kubernetes, ECS, or EKS)
Expertise in monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch, etc.)
Strong scripting or programming background (Python, Bash, or Go)
Sound understanding of networking, security, and identity/access management in the cloud
Experience designing high-availability and disaster recovery strategies for critical workloads

Job Responsibility

Become well-versed in the opportunities and challenges of the business and Trimble's customers
Become an expert in Business Systems services, especially the interfaces—APIs, protocols (e.g. OAuth), and user interfaces
Establish, then utilize tight working relationships with stakeholders across the company, especially Trimble's engineering community
Prototype and create proofs of concept as required
Scope and deploy new integrations
Investigate, diagnose, and solve customer integration issues
Effectively communicate technical issues with stakeholders in non-technical language
Contribute to utilities and SDKs to help integration and migration efforts

Fulltime

Site Reliability Engineer

Shape the Future of Intelligent Operations as a Site Reliability Engineer (AI Op...

Location

India , Chennai

Salary:

Not provided

Trimble Inc.

Expiration Date

Until further notice

Requirements

1 to 2 years of professional experience in a DevOps, MLOps, or systems engineering environment
Bachelor's degree in Computer Science, Engineering, Information Technology, or a closely related technical field
Direct experience with Microsoft Azure cloud platforms and its specialized ecosystem services (such as Azure ML and Azure DevOps)
Proficiency with Python or other scripting languages (Shell / Bash / PowerShell) for rapid system integration and task automation
Foundational understanding of containerization (Docker), basic orchestration concepts (Kubernetes fundamentals), and version control system workflows (Git)
Solid baseline knowledge of fundamental DevOps principles (CI/CD, system administration) and a basic understanding of the end-to-end machine learning model lifecycle

Job Responsibility

Assist in the deployment and maintenance of machine learning models in production under direct supervision, building skills in containerization and orchestration architectures
Support the development of robust continuous integration and deployment pipelines for ML workflows, including model versioning, automated testing, and release processes
Monitor production ML model performance, detect data drift, and track system health by implementing foundational logging, alerting, and metrics solutions
Contribute to infrastructure automation and configuration management for machine learning workloads, learning to treat infrastructure as software
Partner closely with ML engineers and data scientists to operationalize complex models, ensuring reliability, scale, and strict adherence to established operational patterns

What we offer

Structured environment to accelerate technical skills
Direct guidance from experienced engineering professionals
Projects that improve productivity, quality, safety, transparency and sustainability
Collaborative and supportive team
Entrepreneurial spirit empowering proactive doers
Flexible work arrangements

Fulltime

Site Reliability Engineer II

Microsoft is a company where passionate innovators come to collaborate, envision...

Location

India , Hyderabad

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 2+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Work with all aspects of a high throughput and multi-tenant service
Collaborate effectively within the team and with partner teams across Microsoft
Be part of the on-call rotation for maintaining service health
Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams
Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement
Document and define existing data engineering processes, data and technology, while evaluating them for optimization
System Reliability & Uptime – Ensuring high availability of services
Incident Management – Detecting, responding to, and mitigating system failures
Performance Monitoring – Tracking system health and resolving bottlenecks
Automation & Tooling – Reducing manual work through scripts and automation

Fulltime

Site Reliability Engineer

As a Site Reliability Engineer, you are passionate about experience innovation a...

Location

India , Bengaluru

Salary:

Not provided

Valtech

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
2+ years in DevOps, SRE, or Support Engineering roles
Experience with incident management in high-traffic, public-facing platforms
Strong scripting skills (Python, Bash, or PowerShell)
Familiarity with CI/CD tools: GitHub Actions, Azure DevOps, GitLab, Jenkins
Experience with monitoring/APM tools: Datadog, New Relic, Dynatrace, Prometheus, Grafana
Basic knowledge of serverless services in AWS, Azure, or GCP
Proficiency with Docker and containerized environments
Excellent English communication skills (B2+ level)
Experience working in international, cross-cultural teams

Job Responsibility

Maintain and improve observability systems (monitoring, logging, alerting)
Define, adjust, and maintain Service Level Objectives (SLOs)
Participate in incident resolution and on-call rotations (max 1 week/month)
Drive proactive reliability improvements across platforms
Collaborate with teams to analyze failure scenarios and implement mitigations
Create and maintain runbooks for incident response and prevention
Eliminate non-value-adding tasks through automation and process optimization

What we offer

Flexibility, with hybrid work options (country-dependent)
Learning and development, with access to cutting-edge tools, training and industry experts

Fulltime

Site Reliability Engineer - CTJ - Poly

We are seeking a Senior Site Reliability Engineer to lead a team that builds and...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph
This position requires verification of U.S. citizenship due to citizenship-based legal restrictions
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Write secure, high-quality code that is maintainable, scalable, and performant
Architect, implement, and optimize hybrid and cloud infrastructure using Infrastructure as Code (e.g., Containers, Bicep, Terraform, AKS etc.) to improve availability, scale, security, and operational efficiency
Design and implement data governance, storage, backup, and disaster recovery for a multi-petabyte Azure environment, ensuring integrity, security, and performance
Build and operate large-scale data pipelines and data transformations to support analytics, governance, and operational needs
Evaluate emerging engineering tools and practices and incorporate them into the roadmap to continuously improve efficiency, reliability, and scale
Deliver automation to improve service health, manageability, reliability, telemetry, and alerting, with a focus on resiliency
Create and maintain clear technical documentation and design specifications aligned with best practices
Partner with engineering, project management, and operations to evolve services and optimize infrastructure in support of organizational goals
Participate in an on-call rotation to operate live services
troubleshoot and mitigate complex issues, escalate as needed, and write post-incident reviews to share learnings

Fulltime

Select Country

Site Reliability Engineer 2

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Site Reliability Engineer 2

Site Reliability Engineer 2

Site Reliability Engineer

Site Reliability Engineer II

Lead Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer II

Site Reliability Engineer

Site Reliability Engineer - CTJ - Poly

Our AI answers in your language