CrawlJobs Logo

Site Reliability Engineer

United States, Bridgewater Employment contract · Job Posted February 13, 2026
Apply Position
Job Link Share

Job Description

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge reliability engineering practices to build scalable, secure, and highly available systems. As we continue to grow, we’re looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.

Requirements

  • Site Reliability Engineering principles
  • Linux
  • Python / Go
  • Cloud platforms (AWS / Azure / GCP)
  • Kubernetes
  • Docker
  • Helm
  • CI/CD pipelines
  • Infrastructure as Code (Terraform)
  • Monitoring & Alerting (Prometheus, Grafana)
  • Logging (ELK stack)
  • Incident Management
  • Automation & Scripting
  • Git
  • Agile methodologies
  • High Availability & Disaster Recovery
  • At least 3 to 5 years of real-time experience
  • Willing to work on W2
  • Willing to relocate nationwide
  • Willing to take Online Coding Test
  • Looking for H-1B sponsorship for the 2026 quota
  • OPT/CPT/H4 EAD/TN/E3 or any other Non-immigrant visa status

What we offer

H-1B sponsorship for the 2026 quota

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer

8 matching positions

Site Reliability Engineer

Qargo is a cloud-based (SaaS) Transport Management Platform. We are a scale-up b...
Location
Location
Belgium , Ghent
Salary
Salary:
Not provided
qargo.com Logo
Qargo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience as a Software Engineer, with an interest in infrastructure, scalability, reliability
  • Strong programming skills (preferably Python or similar backend languages)
  • Experience working with cloud platforms, container orchestrators, serverless (preferably Google Cloud)
  • Familiarity with distributed systems and scalability challenges
  • Experience with CI/CD pipelines and automation
  • Solid understanding of databases and performance tuning (SQL and/or NoSQL)
  • Familiarity with monitoring and observability tools
  • A problem-solving mindset and the ability to think in systems
  • Strong collaboration skills and a proactive approach to improving systems
Job Responsibility
Job Responsibility
  • Build and maintain systems and tooling that improve the reliability, scalability, and performance of our platform
  • Improve software delivery cycle, focusing on automation and developer experience
  • Develop internal tools and services to reduce manual operational work
  • Improve observability by implementing monitoring, logging, and alerting across systems
  • Optimize system performance, including databases such as PostgreSQL and Firestore
  • Collaborate with backend engineers and other engineering teams to design reliable and scalable system architectures
  • Troubleshoot complex production issues and implement long-term fixes
  • Continuously improve infrastructure (Infrastructure as Code, automation, etc.)
What we offer
What we offer
  • A fast-growing SaaS company with a strong mission and an impact-driven team
  • A flexible work environment with flexible hours and hybrid working
  • A green office with a great atmosphere and lots of initiatives
  • A role with a lot of responsibility, ownership, and tangible impact
  • The opportunity to grow with us and shape both your career and our platform
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for a Site Reliability Engineer (SRE) to support reliable, high-p...
Location
Location
United States , Novi
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Information Technology, Computer Science, Computer Engineering, or comparable practical experience
  • At least 5 years of experience supporting production environments in a corporate, startup, or similarly fast-paced technical setting
  • Hands-on expertise with infrastructure as code, including Terraform, along with experience in cloud platforms and related services
  • Working knowledge of container technologies such as Docker and orchestration platforms like Kubernetes
  • Experience supporting live systems, participating in on-call rotations, and contributing to incident reviews and corrective actions
  • Proficiency with automation and scripting using Bash and Python to reduce manual operational effort
  • Strong communication skills with the ability to explain technical decisions and tradeoffs to cross-functional or non-technical stakeholders
  • Willingness and ability to travel to customer or plant locations as business needs require
Job Responsibility
Job Responsibility
  • Maintain dependable and secure production environments across plant-edge and cloud-based systems, with a focus on uptime, responsiveness, and operational stability
  • Design, refine, and support monitoring dashboards, alerting frameworks, and operational runbooks using tools such as Prometheus, Grafana, and modern telemetry solutions
  • Build and manage infrastructure through code using Terraform, applying version control standards, peer reviews, and controlled deployment processes
  • Create automation scripts and lightweight tools in Bash and Python to streamline routine operations, recovery procedures, backup workflows, and environment setup
  • Take part in incident response and on-call coverage, troubleshoot service disruptions, coordinate initial communication, and document follow-up actions through blameless reviews
  • Establish and measure service reliability indicators and objectives, helping stakeholders balance system dependability with release speed and operational risk
  • Support secure connectivity between factory networks and cloud resources by configuring and maintaining VPNs, routing, private networking, and access controls
  • Administer and optimize relational or time-series databases, including backup planning, replication, performance tuning, and long-term storage health
  • Contribute to CI/CD delivery practices by improving deployment pipelines, supporting controlled release strategies, and preparing rollback procedures when needed
  • Partner with controls, software, and data teams to enable reliable data flow from industrial systems and ensure safe deployment to edge infrastructure
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • 401(k) plan
Read More
Arrow Right

Site Reliability Engineer

As a Site Reliability Engineer, you are passionate about experience innovation a...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
valtech.com Logo
Valtech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
  • 2+ years in DevOps, SRE, or Support Engineering roles
  • Experience with incident management in high-traffic, public-facing platforms
  • Strong scripting skills (Python, Bash, or PowerShell)
  • Familiarity with CI/CD tools: GitHub Actions, Azure DevOps, GitLab, Jenkins
  • Experience with monitoring/APM tools: Datadog, New Relic, Dynatrace, Prometheus, Grafana
  • Basic knowledge of serverless services in AWS, Azure, or GCP
  • Proficiency with Docker and containerized environments
  • Excellent English communication skills (B2+ level)
  • Experience working in international, cross-cultural teams
Job Responsibility
Job Responsibility
  • Maintain and improve observability systems (monitoring, logging, alerting)
  • Define, adjust, and maintain Service Level Objectives (SLOs)
  • Participate in incident resolution and on-call rotations (max 1 week/month)
  • Drive proactive reliability improvements across platforms
  • Collaborate with teams to analyze failure scenarios and implement mitigations
  • Create and maintain runbooks for incident response and prevention
  • Eliminate non-value-adding tasks through automation and process optimization
What we offer
What we offer
  • Flexibility, with hybrid work options (country-dependent)
  • Learning and development, with access to cutting-edge tools, training and industry experts
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

NetApp is looking for a Senior TechOps Engineer - Cassandra to join our growing ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in Apache Cassandra administration and architecture, with a desire to continuously learn and develop to an expert level
  • Experience in diagnosing and recommending mitigation strategies for Cassandra-related issues, including performance degradation due to resource bottlenecks, suboptimal data modeling leading to hot partitions, excessive tombstones, and inefficiencies caused by range slices and poorly constructed queries
  • Hands-on experience with Cassandra architecture and core administrative tasks, including compactions, repairs, backup and recovery, schema disagreement resolution, and configuration management
  • Experience handling Cassandra maintenance activities, including upgrades and migrations
  • Ability to investigate and research Cassandra issues by reviewing the Apache Cassandra codebase
  • Strong knowledge and experience with Linux, with the ability to work comfortably from the command line
  • Exceptional ability to communicate clearly and professionally in written and verbal English
  • Experience working with at least one public cloud platform, preferably AWS
  • Prior IT customer service or support experience within an ITIL-based environment
  • Strong fundamental computer science and software engineering skills, particularly in operating system internals, memory management, and networking
Job Responsibility
Job Responsibility
  • Your work will ensure the security, reliability, and performance of world-class systems and databases
  • You will collaborate with the technical teams of our customers, who are globally recognized companies in the gaming, banking, and logistics industries, ranging from large multinationals to emerging start-ups
What we offer
What we offer
  • Volunteer time off
  • Well-being
  • Time away
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As Site Reliability Engineer you will contribute to the overarching implementati...
Location
Location
Romania , Bucuresti
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • Minimum 5 years proven work experience as a Reliability Engineer or similar role
  • Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as with Docker / Kubernetes in combination with Google Kubernetes Engine (GKE), Terraform or similar technology
  • Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm
  • Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud Operations, Lightstep and Ansible
  • Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures and experience with API Management with Apigee or WSO2
  • Proactive attitude and collaborative Team player mindset paired with self confidence
  • Not losing your coolness and keep your eye for details even in stressful situations where time matters
  • Having a creative approach towards solving technical problems
  • Excellent communication skills in English
Job Responsibility
Job Responsibility
  • Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability
  • Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on
  • Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint
  • Hands-on experience in implementing reliability and resilience patterns like auto-scaling, circuit breakers, bulk-heads, rate limiter, retry mechanisms, etc.
  • Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best practices
  • Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE chapter function
  • Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population
  • Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high standards
  • Lead people of your engineer profession and make them become better each day
What we offer
What we offer
  • Smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Our projects have different working hours to suit your needs
  • Sponsored certifications, trainings and top e-learning platforms
  • Private Health Insurance – custom-made for you
  • Individual coaching sessions or accredited Coaching School
  • Epic parties or themed events – lovingly designed for our people and their families
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Build the tools and systems that make M365 sovereign cloud operations faster, sm...
Location
Location
United States , Multiple Locations
Salary
Salary:
102100.00 - 219200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Passionate about distributed systems and working with highly scalable services
  • Enjoys new technological challenges and is motivated to solve them
  • Excited about making better software and continuously improving the development, integration, and deployment processes
  • Self-starter who thrives in a bottoms-up, fast-paced, highly technical environment
  • Effective collaborator, experienced in creating technical partnerships across teams
  • Committed to ensuring exceptional customer satisfaction through technical excellence
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role
  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI)
Job Responsibility
Job Responsibility
  • Creates and implements code for a product, service, or feature, reusing code as applicable with minimal supervision
  • Acts as a designated responsible individual (DRI), working on-call to monitor a system/product feature/service for degradation, downtime, or interruptions
  • Maintains operations of live site service, following security best practices when responding quickly to mitigate issues while using the minimum required permissions to do so that arise on a rotational, on-call basis
  • Contributes to identifying dependencies, and incorporates them into the development of design documents for a product area with little oversight
  • Contributes to the identification of requirements for, and development of automation within production and deployment of a complex product feature, targeting zero-touch deployment when possible
  • Works with appropriate internal stakeholders to understand and determine customer/user requirements for a set of features
  • Remains current in skills by investing time and effort into being informed of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
What we offer
What we offer
  • Certain roles may be eligible for benefits and other compensation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for a Site Reliability Engineer to support the stability, perform...
Location
Location
United States , New York
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related discipline, or equivalent practical experience in infrastructure or operations
  • Working knowledge of Linux and/or Windows server administration fundamentals
  • Understanding of core networking principles such as TCP/IP, DNS, VLANs, routing, and firewall concepts
  • Experience with at least one scripting or automation language such as Python, Bash, or PowerShell
  • Familiarity with cloud infrastructure concepts in at least one major platform, such as Azure or AWS
  • Exposure to automation and configuration tools such as Terraform or Ansible
  • Strong analytical thinking, troubleshooting ability, and a willingness to learn in a fast-moving technical environment
  • Clear written and verbal communication skills with the ability to document operational procedures effectively
Job Responsibility
Job Responsibility
  • Oversee the health of production platforms through monitoring tools, assist with incident response, and help refine alerts, dashboards, and issue tracking processes
  • Support day-to-day operations for infrastructure spanning on-premises facilities and cloud environments, including servers, storage, network components, and middleware services
  • Contribute to the administration of multi-cloud resources across platforms such as Azure and Amazon EC2, with involvement in compute, networking, storage, and identity-related tasks
  • Build and enhance automation solutions using Infrastructure as Code practices to streamline repeatable work and improve platform consistency
  • Participate in DevSecOps and GitOps processes by assisting with CI/CD workflows, configuration management, and policy adherence
  • Help strengthen cloud security by identifying configuration gaps, assisting with remediation efforts, and supporting vulnerability reduction initiatives
  • Join the on-call rotation, respond to operational events, and contribute to post-incident reviews focused on continuous improvement
  • Create and maintain runbooks, technical procedures, and system documentation to improve operational readiness and knowledge sharing
  • Assist with containerized and orchestrated environments, including platforms that use Kubernetes, to support scalable application operations
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Barclays is seeking a Site Reliability Engineer to join its Securitized Products...
Location
Location
United States , Whippany
Salary
Salary:
170000.00 - 230000.00 USD / Year
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Programming or scripting experience (Python, Go, PowerShell, Bash, or similar) and SQL
  • Linux/Unix/Windows systems and systems engineering fundamentals
  • Client-server model architecture and scalability knowledge monitoring high traffic by distributing load across multiple backend servers
  • Performance monitoring and reducing latency in request-response cycles
  • Containers and orchestration (Docker, Kubernetes)
  • Networking (TCP/IP, DNS, HTTP, SFTP) and relational databases
  • and monitoring and observability tools (Geneos ITRS, Prometheus, Grafana, APM, Observe)
Job Responsibility
Job Responsibility
  • Development and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and tools
  • Cross-functional collaboration with product managers, designers, and other engineers to define software requirements, devise solution strategies, and ensure seamless integration and alignment with business objectives
  • Collaboration with peers, participate in code reviews, and promote a culture of code quality and knowledge sharing
  • Stay informed of industry technology trends and innovations and actively contribute to the organization’s technology communities to foster a culture of technical excellence and growth
  • Adherence to secure coding practices to mitigate vulnerabilities, protect sensitive data, and ensure secure software solutions
  • Implementation of effective unit testing practices to ensure proper code design, readability, and reliability
What we offer
What we offer
  • medical, dental and vision coverage
  • 401(k)
  • life insurance
  • other paid leave
  • incentive award
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right