Data Site Reliability Engineer Job at Optiver (Singapore)

Senior Site Reliability Engineer

Architect, develop, and troubleshoot large-scale infrastructure, maintain and im...

Location

United States , San Francisco

Salary:

180960.00 - 230900.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Software Engineering, Information Technology or a closely related field
four years of experience as a Site Reliability Engineer architecting, developing, and troubleshooting large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash
networking technologies such as TCP/IP or security
four years of experience in automation development and infrastructure as code implementation using tools such as Terraform, AWS CloudFormation, Ansible, or Salt
knowledge of Linux and Windows systems
cloud technologies within AWS, GCP, Azure
continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices
must pass technical interview

Job Responsibility

Architect, develop, and troubleshoot large scale infrastructure utilizing programming languages such as PowerShell, Python, or Bash and networking technologies such as TCP/IP or security
provide real-time feedback on production systems
work with product family and platform developers to maintain and improve services and performance with a strong customer focus
utilize a variety of data collection, enrichment, analytics, and visualizations to support our complex systems
responsible for automation development and infrastructure-as-code implementation using tools such as Terraform, AWS CloudFormation, Ansible, and/or Salt
build solutions to enhance availability, performance, and stability for hundreds of Atlassian enterprise customers in the cloud as well as automate repetitive work
help secure the cloud architecture with penetration testing, vulnerability resolution, and compliance audit responses
responsible for continuous integration continuous delivery/deployment (CICD) practices and monitoring and observability practices

What we offer

Health and wellbeing resources
paid volunteer days

Fulltime

Principal Site Reliability Engineer

Location

United States , Ft. Meade

Salary:

Not provided

CipherLogix

Expiration Date

Until further notice

Requirements

Fourteen (14) years experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution
Ten (10) years experience in system engineering/architecture
Ten (10) years experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, CloudBase/Acumulo, Big Table, Cassandra, Scality etc
At least ten (10) years experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software automation
At least four (4) years experience managing and monitoring large Cloud System (>200 nodes). Cloud Systems Administrator or Developer Certification
Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management
Ten (10) years experience in the cleared environment
Ten (10) years demonstrated experience developing software for one of the following: Windows, UNIX, or Linux OS
Knowledge and experience with developing distributed storage routing and querying algorithms
Experience in developing documentation required to support a program’s technical issues and training situations

Fulltime

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...

Location

United States , Sunnyvale

Salary:

175000.00 - 250000.00 USD / Year

Figure

Expiration Date

Until further notice

Requirements

Strong experience with Linux/Unix systems administration
Proficiency in programming/scripting
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
Ability to work in cross-functional teams with developers, infra, and product teams
Excellent verbal and written communication skills

Job Responsibility

Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
Migrate SaaS to self-hosted solutions to enhance security and reliability
Implement monitoring and alerting systems, and define incident response plans and runbooks
Reduce human workload through automation to automate deployment and scaling
Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
Use a data driven approach to demonstrate service robustness and track optimization work
Partner with the security team to ensure that security remediations and updates are applied in a timely manner

Fulltime

Site Reliability Engineer

Join our client, a leading financial institution at the forefront of innovation,...

Location

United States , Austin

Salary:

57.00 - 63.33 USD / Hour

Aquent

Expiration Date

Until further notice

Requirements

Proven experience leading engineering teams and delivering projects using Scrum and efficient release practices
Strong background in converting high-level designs into low-level designs and providing technical oversight
Demonstrated experience in designing, architecting, and deploying cloud-native applications, specifically on GCP
Proficiency with various database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL
Expertise in containerization technologies such as Docker and Kubernetes, and building/managing CI/CD pipelines
Experience leveraging AI-Driven software development tools to enhance productivity, code comprehension, and documentation
Proven track record of integrating and applying AI/Machine Learning models for data analytics, visualization, automation, and problem-solving
Ability to maintain high quality standards while delivering within tight schedules
Exceptional collaborative mindset with a bias for action, engaging effectively with product management, architects, and other domains
Strong ability to work with internal, external, and offshore stakeholders

Job Responsibility

Drive Technical Leadership & Project Delivery: Lead engineering teams through the entire project lifecycle, leveraging agile methodologies like Scrum to ensure efficient delivery and robust release practices
Architect & Design Cloud-Native Solutions: Translate high-level architectural visions into detailed low-level designs, providing expert technical oversight for the development and deployment of cutting-edge cloud-native applications
Champion Reliability & Scalability: Design, architect, and deploy highly available and scalable cloud-native applications on platforms such as GCP, ensuring optimal performance and resilience
Optimize Data Management: Leverage your expertise with diverse database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL, to build and maintain robust data solutions
Advance DevOps & Automation: Implement and optimize containerization strategies using technologies like Docker and Kubernetes, and establish sophisticated CI/CD pipelines to streamline development and deployment
Innovate with AI/ML: Integrate and apply AI/Machine Learning models to enhance data analytics, visualization, automation, and creatively solve complex business and technical challenges
Foster Collaboration & Mentorship: Work closely with diverse stakeholders across product management, architecture, and other engineering domains, while actively mentoring and coaching multiple teams to elevate technical capabilities
Influence & Present Solutions: Effectively engage subject matter experts, present complex architectural solutions to governance boards and stakeholders, and advocate for data-driven proposals

What we offer

subsidized health, vision, and dental plans
paid sick leave
retirement plans with a match

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...

Location

United States , Birmingham

Salary:

Not provided

Genuine Parts Company

Expiration Date

Until further notice

Requirements

Bachelor's degree
Three (3) to five (5) years of related experience or an equivalent combination
Intermediate knowledge of appropriate networks, products, and protocols
Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
Troubleshooting skills
Problem solving skills
Demonstrated knowledge and adherence to Change Management processes
Ability to interface well with customers, end users, partners, and associates

Job Responsibility

Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
Responsible for making sure that the company network works
Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
Helps to negotiate and place orders with common carriers
Performs other duties as assigned

What we offer

Healthcare coverage
401(k)
Tuition reimbursement
Vacation
Sick pay
Holiday pay

Fulltime

Principal Site Reliability Engineer

We are looking for a Principal Site Reliability Engineer to join the CVML Platfo...

Location

United States

Salary:

166000.00 - 293000.00 USD / Year

Blue River Technology

Expiration Date

Until further notice

Requirements

8+ years of experience building infrastructure with K8S, AWS, and bare metal
8+ years of experience working with Python and Go (with production experience)
8+ years of experience working with infra automation tools: Terraform / Terragrunt (or Pulumi / CDK)
8+ experience with Linux-based systems and networks, and a deep understanding of internal components, networking, and security aspects
Has a track record of building and maintaining scalable systems in production environments
Experience in building CI/CD pipelines using GitHub Actions (or GitLab / Jenkins) for application release and deployment
Experience in using AWS ECS, EKS, IAM, EC2, and RDS at production scale
Deep understanding of Kubernetes and its internals (kubelet, CRDs, etc) and experience with building and extending clusters from scratch
Strong problem-solving skills and ability to troubleshoot complex infrastructure and networking issues
Excellent communication skills to collaborate effectively with technical and non-technical stakeholders

Job Responsibility

System Design: Architect and implement various cloud and on-premise applications, systems, and infrastructure
Hybrid system integration: Integrate extremely diverse systems, configure stable integration, uptime, and monitoring
Edge device integration: work with edge devices of various formats and integrate them with on-prem and cloud workflows, including networking, low-level OS, and electrical/control integration
Low-level performance optimization: optimize the performance and throughput of the system at the filesystem, networking, and software levels
High-level optimisation of cost and stability: optimize cost, operational stability, and supportability of highly diverse platforms and tech stack
Product Mindset: Collaborate with cross-functional teams to design, develop, and maintain robust, scalable, and user-friendly web and mobile data-intensive applications
System Integration: Build tools that enable users to easily move between different applications and platforms to utilize the strengths of each in a coherent ecosystem
Collaboration: Work closely with cross-functional teams, including data scientists, analysts, software engineers, and product managers, to understand data requirements and deliver data solutions that align with business goals
Documentation: Create and maintain technical documentation, including data flow diagrams, architecture designs, and standard operating procedures
Technology Evaluation: Stay up-to-date with industry trends and emerging technologies related to data engineering, recommending and implementing new tools and frameworks as appropriate

What we offer

eligibility for Blue River’s bonus and benefit programs

Fulltime

Principal Site Reliability Engineer

Groupon is modernizing its global platform — and reliability is at the center of...

Location

Colombia

Salary:

Not provided

Groupon

Expiration Date

Until further notice

Requirements

10+ years in software/systems engineering
5+ years in SRE or platform reliability
Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform
Proficiency in Python or Go for automation and tooling
Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy)
Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations
Strong communication and influencing skills — data over hierarchy

Job Responsibility

Architect and maintain self-healing systems with 99.9%+ availability targets
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns
Implement adaptive SLIs/SLOs that evolve automatically from real-time data
Build AIOps-based observability and auto-remediation pipelines
Apply predictive modeling to forecast failures before they impact users
Lead chaos, performance, and resilience testing programs
Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance
Mentor engineers and drive reliability standards across teams
Partner with platform, data, and product teams to ensure stability aligns with business goals
Support major incident response, incident review, and participate in on-call rotations

What we offer

The opportunity to work with cutting-edge technologies in a transformative environment
Professional growth and leadership development pathways tailored to your aspirations
A chance to leave a lasting impact by shaping the future of reliable and scalable systems

Principal Site Reliability Engineer

Groupon is modernizing its global platform — and reliability is at the center of...

Location

Ecuador

Salary:

Not provided

Groupon

Expiration Date

Until further notice

Requirements

10+ years in software/systems engineering, including 5+ years in SRE or platform reliability
Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform
Proficiency in Python or Go for automation and tooling
Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy)
Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations
Strong communication and influencing skills — data over hierarchy

Job Responsibility

Architect and maintain self-healing systems with 99.9%+ availability targets
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns
Implement adaptive SLIs/SLOs that evolve automatically from real-time data
Build AIOps-based observability and auto-remediation pipelines
Apply predictive modeling to forecast failures before they impact users
Lead chaos, performance, and resilience testing programs
Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance
Mentor engineers and drive reliability standards across teams
Partner with platform, data, and product teams to ensure stability aligns with business goals
Support major incident response, incident review, and participate in on-call rotations

What we offer

The opportunity to work with cutting-edge technologies in a transformative environment
Professional growth and leadership development pathways tailored to your aspirations
A chance to leave a lasting impact by shaping the future of reliable and scalable systems

Data Site Reliability Engineer

Optiver

Location:
Singapore , Singapore

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
January 06, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Data Site Reliability Engineer

Senior Site Reliability Engineer

Principal Site Reliability Engineer

Staff Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer II

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Data Site Reliability Engineer

Optiver

Location:Singapore , Singapore

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:January 06, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Data Site Reliability Engineer

Senior Site Reliability Engineer

Principal Site Reliability Engineer

Staff Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer II

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Location:
Singapore , Singapore

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
January 06, 2026