Site Reliability Engineer Job at NTT DATA (Bucuresti)

Job Description

As Site Reliability Engineer you will contribute to the overarching implementation and operation of our client's Online Banking platform in the Google Cloud to become a central part of the feature-squads, based on the paradigm 'you built it you run it'.

Job Responsibility

Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability
Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on
Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint
Hands-on experience in implementing reliability and resilience patterns like auto-scaling, circuit breakers, bulk-heads, rate limiter, retry mechanisms, etc.
Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best practices
Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE chapter function
Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population
Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high standards
Lead people of your engineer profession and make them become better each day

Requirements

Bachelor's degree in Computer Science, Engineering, or related field
Minimum 5 years proven work experience as a Reliability Engineer or similar role
Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as with Docker / Kubernetes in combination with Google Kubernetes Engine (GKE), Terraform or similar technology
Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm
Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud Operations, Lightstep and Ansible
Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures and experience with API Management with Apigee or WSO2
Proactive attitude and collaborative Team player mindset paired with self confidence
Not losing your coolness and keep your eye for details even in stressful situations where time matters
Having a creative approach towards solving technical problems
Excellent communication skills in English

What we offer

Smooth integration and a supportive mentor
Pick your working style: choose from Remote, Hybrid or Office work opportunities
Our projects have different working hours to suit your needs
Sponsored certifications, trainings and top e-learning platforms
Private Health Insurance – custom-made for you
Individual coaching sessions or accredited Coaching School
Epic parties or themed events – lovingly designed for our people and their families

NTT DATA - All Job Offers

Select Country

Site Reliability Engineer

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Our AI answers in your language