Data Site Reliability Engineer Job at Optiver (Shanghai)

Senior Site Reliability Engineer - Data Pipeline

Bloomreach is building the world’s premier agentic platform for personalization....

Location

Czechia

Salary:

Not provided

Bloomreach

Expiration Date

Until further notice

Requirements

You can articulate how your contributions have transformed the way engineers work and think by fostering a strong DevOps/SRE culture
You can demonstrate how impactful your work as an SRE or DevOps Engineer can be in connection to business success
You understand the importance of you build - you run it principle and you love the feeling you own it
You are mindful of the costs associated with running our service, which translates into effective vertical and horizontal pod autoscaling and detailed telemetry insights
You believe the infrastructure as a code is the only thing that can bring stability into chaos
Terraform is your daily bread, and HELM deployments are your second-best friend
You use telemetry data and metrics to provide feedback to engineers on how the application and services behave
You can navigate yourself in complex service architecture by using distributed debugging
You have experience with Python and a solid grasp of engineering practices
You don’t hesitate to participate in OnCall rotation 24/7 support

Job Responsibility

Your task is to build and maintain an ecosystem where engineers can safely and efficiently develop, debug and operate their services running in GCP, Kubernetes using DataFlow, DataProc and Python with Go
You make sure the services have high level of observability, enabling us to provide quality service for our customers
Further services can scale vertically and horizontally based on current load, operational and telemetric data (OTEL, Prometheus, Victoria Metrics)
Team have enough insights about health of our services (Grafana, Alerting, PageDuty)
You helps the team to fulfill security requirements given ISO and SOC2 audits, by enforce security principles like key distribution, key rotation, authorisation & authentication on service level, data encryption at transit, data isolation, resource limitations, quality of service, audit logs (mainly by Enovy proxies)
You contribute to our tooling, so we have tools in place for debugging, troubleshoot and performance testing
You automate manual/semi-manual steps deployment and instance setup
You have hands on on L3 support and incident resolutions
CI pipelines have linters, security scans, code smell detection enabling engineers to produce quality MRs

What we offer

A great deal of freedom and trust
We have defined our 5 values and the 10 underlying key behaviors that we strongly believe in
We believe in flexible working hours to accommodate your working style
We work virtual-first with several Bloomreach Hubs available across three continents
We organize company events to experience the global spirit of the company and get excited about what's ahead
We encourage and support our employees to engage in volunteering activities - every Bloomreacher can take 5 paid days off to volunteer
We have a People Development Program -- participating in personal development workshops on various topics run by experts from inside the company
Our resident communication coach Ivo Večeřa is available to help navigate work-related communications & decision-making challenges
Our managers are strongly encouraged to participate in the Leader Development Program
Bloomreachers utilize the $1,500 professional education budget on an annual basis to purchase education products (books, courses, certifications, etc.)

Fulltime

Senior Site Reliability Engineer - Data Pipeline

The Data Pipeline team is a backend-focused engineering team that is built on st...

Location

Slovakia

Salary:

3500.00 EUR / Month

Bloomreach

Expiration Date

Until further notice

Requirements

You can articulate how your contributions have transformed the way engineers work and think by fostering a strong DevOps/SRE culture.
You can demonstrate how impactful your work as an SRE or DevOps Engineer can be in connection to business success
You understand the importance of you build - you run it principle and you love the feeling you own it
You are mindful of the costs associated with running our service, which translates into effective vertical and horizontal pod autoscaling and detailed telemetry insights.
You believe the infrastructure as a code is the only thing that can bring stability into chaos
Terraform is your daily bread, and HELM deployments are your second-best friend
You use telemetry data and metrics to provide feedback to engineers on how the application and services behave
You can navigate yourself in complex service architecture by using distributed debugging
You have experience with Python and a solid grasp of engineering practices
A big advantage is, if you have an experience with Go, or with ETL pipelines

Job Responsibility

Build and maintain an ecosystem where engineers can safely and efficiently develop, debug and operate their services running in GCP, Kubernetes using DataFlow, DataProc and Python with Go
Make sure the services have high level of observability, enabling us to provide quality service for our customers
Ensure further services can scale vertically and horizontally based on current load, operational and telemetric data (OTEL, Prometheus, Victoria Metrics)
Ensure team have enough insights about health of our services (Grafana, Alerting, PageDuty)
Help the team to fulfill security requirements given ISO and SOC2 audits, by enforce security principles like key distribution, key rotation, authorisation & authentication on service level, data encryption at transit, data isolation, resource limitations, quality of service, audit logs (mainly by Enovy proxies)
Contribute to our tooling, so we have tools in place for debugging, troubleshoot and performance testing
Automate manual/semi-manual steps deployment and instance setup
Have hands on on L3 support and incident resolutions
Ensure CI pipelines have linters, security scans, code smell detection enabling engineers to produce quality MRs

What we offer

A great deal of freedom and trust
Flexible working hours
Work virtual-first with several Bloomreach Hubs available across three continents
Company events
5 paid days off to volunteer
People Development Program
Communication coach available
Leader Development Program
$1,500 professional education budget annually
Employee Assistance Program with counselors

Fulltime

Site Reliability Engineer - Data Platform Operation

Join our Data & AI Platform team as a Site Reliability Engineer (SRE) – Platform...

Location

Brazil , Sao Paulo

Salary:

Not provided

Amaris Consulting

Expiration Date

Until further notice

Requirements

Academic background: Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field (minimum 3 years of experience)
Experience: 5+ years hands-on with cloud platforms (Azure, AWS, GCP), programming (Bash, PowerShell, Terraform, Python, Java), and Infrastructure as Code (IaC)
English language: Professional working proficiency in English and the local language
Tools / software: Deep expertise in Azure, Databricks, Unity Catalog, Kubernetes, Helm, Docker, Power BI, Datadog, Grafana, GitHub, Azure DevOps, ArgoCD, Airflow, SSIS, Power Query, and relational/NoSQL databases
AI experience: Experience supporting enterprise Data & AI platforms
Soft skills: Analytical problem-solving
Effective communication and active listening
Team player with respect for others
Strong troubleshooting and platform monitoring skills
Automation (Python, PowerShell, CLI, KQL, Terraform)

Job Responsibility

Support, manage, and maintain Azure resources: Azure SQL, Synapse, Data Factory, Databricks, Unity Catalog
Monitor Azure workloads, troubleshoot incidents, alerts, and performance bottlenecks
Implement and manage RBAC, identity & access policies, and compliance controls
Optimize Azure cost and performance using Azure Monitor, DataDog, and Cost Management tools
Automate tasks using PowerShell, Azure CLI, Terraform, and Python
Utilize Git, GitHub Actions, and Airflow for workflow automation
Provide L2/L3 support for data pipelines, reporting, and cloud services
Conduct incident response, root cause analysis (RCA), and proactive issue resolution
Collaborate with Cloud Engineering, Data Engineers, BI Developers, and Cloud Architects
Follow ITSM processes: Incident, Change, and Problem Management

What we offer

An international community bringing together 110+ different nationalities
An environment where trust has a central place: 70% of our key leaders started their careers at the first level of responsibility
A robust training system with our internal Academy and 250+ available modules
A vibrant workplace that frequently gathers for internal events (afterworks, team buildings, etc.)
Strong commitments to CSR, notably through participation in our WeCare Together program