Site Reliability Engineer II Job at Bentley Systems (Exton)

Site Reliability Engineer II

Microsoft is a company where passionate innovators come to collaborate, envision...

Location

India , Hyderabad

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 2+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Job Responsibility

Work with all aspects of a high throughput and multi-tenant service
Collaborate effectively within the team and with partner teams across Microsoft
Be part of the on-call rotation for maintaining service health
Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams
Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement
Document and define existing data engineering processes, data and technology, while evaluating them for optimization
System Reliability & Uptime – Ensuring high availability of services
Incident Management – Detecting, responding to, and mitigating system failures
Performance Monitoring – Tracking system health and resolving bottlenecks
Automation & Tooling – Reducing manual work through scripts and automation

Fulltime

Site Reliability Engineer II

Location

Canada

Salary:

170000.00 - 200000.00 CAD / Year

Axon

Expiration Date

Until further notice

Requirements

5+ years of applicable experience
Experience managing cloud platforms such as Azure, AWS, or similar
Experience using managed languages such as Python, Go, C#, Java, or similar
Experience operating in Kubernetes platforms like AKS, EKS, or similar
Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, AWS CloudFormation, or similar
Builder-operator mindset with proven production ownership (uptime, SLOs, on-call, incident leadership)
Empathy to support the needs of software engineers

Job Responsibility

Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, securely, and cost-effective
Exemplify cloud-native site reliability best practices
Write code that is performant, maintainable, clear, and concise
Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems
Influence and educate the engineering organization to adopt new and improved architectural patterns
Provide robust documentation for use by engineers to promote self-service
Take calculated risks, champion new ideas, and cultivate your craft

Fulltime

Site Reliability Engineer II

The IDEAS organization’s mission is to unlock the power of data to deliver actio...

Location

United States , Redmond

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR equivalent experience
Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems
Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell)
Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions
Effective written and verbal communication skills, and experience collaborating across teams and disciplines
Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter
The successful candidate must have an active U.S. Government Secret Security Clearance

Job Responsibility

Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning
Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort
Instrument services for observability
collect and analyze telemetry and health signals
and use data to guide reliability and performance improvements
Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions
Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements
Support compliance with security, privacy, and accessibility requirements throughout service onboarding and ongoing operations
Continuously learn and adopt industry practices and internal tools to improve reliability, performance, and observability

Fulltime

Site Reliability Engineer II

Are you interested in working on cutting-edge cloud security products? Would you...

Location

United States , Multiple Locations

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR equivalent experience
Active U.S. Government Top Secret Security Clearance
Ability to pass Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Live Site Operations: Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health and responding to incidents within SLA timelines
Automation & Deployment: Contribute to automation efforts and validate code functionality in non-production environments to ensure smooth deployments
Compliance & Security: Support compliance processes by verifying security, privacy, and accessibility standards during onboarding of new technologies
Continuous Learning: Stay current with industry trends and internal tools to improve reliability, performance, and observability at scale
Engineering Best Practices: Apply proven development and scaling practices to meet performance and customer requirements
Cross-Team Collaboration: Communicate effectively with engineering partners to align on goals and deliver user-centric solutions
Incident Response & Postmortems: Address complex live site issues, implement mitigations, and document learnings through postmortems

Fulltime

Site Reliability Engineer II

As an intermediate Site Reliability Engineer on the Core Infrastructure team in ...

Location

Canada , Toronto

Salary:

115000.00 - 165000.00 CAD / Year

PagerDuty

Expiration Date

Until further notice

Requirements

3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
Hands-on experience operating Linux-based systems in production environments
Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow
Experience with container orchestration (e.g., EKS, Kubernetes)
Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts
Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)
Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)

Job Responsibility

Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems
Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities
Participate in agile rituals (standups, planning, retros) and communicate progress/risks early
Stay current on technical trends to suggest innovative tools and approaches to interesting problems
Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents

What we offer

Competitive salary
Comprehensive benefits package
Flexible work arrangements
Company equity
ESPP (Employee Stock Purchase Program)
Retirement or pension plan
Generous paid vacation time
Paid holidays and sick leave
Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent

Fulltime

Site Reliability Engineer II

We are seeking an experienced Site Reliability Engineer II to help build, mainta...

Location

United States , Alpharetta

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

3+ years experience in SRE, DevOps, or Cloud Infrastructure roles
Strong hands-on experience with Microsoft Azure services
Advanced experience with Terraform and Terragrunt
Proficiency with Kubernetes/AKS and container orchestration
Experience with CI/CD tools including GitHub Actions and ArgoCD
Solid understanding of observability tooling, especially Grafana
Hands-on experience with Java environments (for app debugging/support)

Job Responsibility

Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt
Maintain, monitor, and optimize Kubernetes clusters (AKS)
Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD in a GitOps model
Enhance reliability through monitoring, alerting, and observability using Grafana (Prometheus, Loki, Tempo is a plus)
Automate operational tasks to reduce manual toil
Participate in on-call rotations, incident response, and post-mortem reviews
Collaborate with development teams to improve application reliability, performance, and scalability
Implement and advocate for SRE practices including SLIs, SLOs, and error budgets
Continuously improve infrastructure performance, cost efficiency, and security posture

What we offer

medical
vision
dental
life and disability insurance
company 401(k) plan

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...

Location

United States , Birmingham

Salary:

Not provided

Alliance Automotive UK LV Ltd

Expiration Date

Until further notice

Requirements

Typically requires a bachelor's degree and three (3) to five (5) years of related experience or an equivalent combination
Intermediate knowledge of appropriate networks, products, and protocols
Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
Troubleshooting skills
Problem solving skills
Demonstrated knowledge and adherence to Change Management processes
Ability to interface well with customers, end users, partners, and associates

Job Responsibility

Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
Responsible for making sure that the company network works
Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
Helps to negotiate and place orders with common carriers
Performs other duties as assigned

What we offer

options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay

Fulltime

Site Reliability Engineer II

As an SRE Engineer II, you will be responsible for managing our multi-cloud infr...

Location

United States , Sunnyvale

Salary:

138000.00 - 159000.00 USD / Year

Illumio

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, or related field
or equivalent work experience
2+ years of experience working as an SRE, DevOps Engineer, or similar role, with hands-on experience in Azure cloud platform in a production environment setting
Proficiency in scripting and programming languages such as PowerShell, Python, or Go for automation and infrastructure management tasks
Experience with CI/CD tools and methodologies, containerization technologies, and microservices architecture in cloud environments
Strong analytical, problem-solving, and communication skills, with the ability to collaborate effectively with cross-functional teams

Job Responsibility

Design, deploy, and maintain cloud infrastructure solutions on Azure, AWS, and/or GCP to support our applications and services
Implement infrastructure as code (IaC) principles using tools such as Terraform, ARM templates, or CloudFormation to automate provisioning and configuration management
Develop and maintain CI/CD pipelines for automated software delivery and deployment, leveraging tools such as Azure DevOps, AWS CodePipeline, or Jenkins
Monitor system performance, application health, and infrastructure metrics using cloud monitoring and logging services, and implement proactive measures to optimize performance and availability
Support incident response and resolution efforts, conduct root cause analysis, implement corrective actions, and document post-incident reviews
Collaborate with Engineering teams to design and implement scalable and reliable architectures, providing guidance on best practices for cloud-native application development
Implement security best practices and controls in cloud environments to protect data, applications, and infrastructure, and ensure compliance with regulatory requirements
Drive automation initiatives to streamline operational tasks, reduce manual effort, and improve overall efficiency in cloud operations
Stay current with cloud platform updates, trends, and best practices, and evaluate emerging technologies for potential adoption to drive innovation and efficiency
Provide support and guidance to junior team members, fostering a culture of learning, collaboration, and continuous improvement within the SRE/DevOps team

What we offer

Medical, Dental, Vision Coverage
Health and Dependent Savings Accounts
Life and Disability Programs
Paid Parental Leave
Voluntary Benefit Programs
Company Sponsored Wellness Program
Wellness Reimbursement Program
Retirement Savings
Equity Opportunities
Paid time off and Paid Holidays

Fulltime

Select Country

Site Reliability Engineer II

Job Responsibility

Requirements

Looking for more opportunities?

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Site Reliability Engineer II

Our AI answers in your language