Data Engineer (Production Support) for AWS EMR Job at NTT DATA (Shangai)

Data Engineer (Production Support) for AWS EMR

The ideal candidate will ensure smooth operation, performance, and stability of ...

Location

China , Shangai

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Hands-on experience with Spark, Scala, Hive
Experience on Kafka, NiFi, various Amazon Web Service (AWS) tools
Familiarity with data loading tools like Talend
Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie
Strong knowledge of Shell Scripting, python or Java for scripting and automation
Familiarity with SQL and query optimization techniques
Experience in production support & operations management
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
5 to 15 years total IT experience

Job Responsibility

Monitor data integration (data lake), troubleshoot, and resolve issues in real-time
Investigate and debug data processing failures and performance bottlenecks
Maintain and support ETL/ELT pipelines built on tools such as Spark, Scala, Hive and Glue
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, MySQL or Snowflake
Perform root cause analysis, identify and analyze data discrepancies if any
Implement and monitor automated workflows using AWS tools
Analyze and optimize job performance by tuning Spark/Hive configurations and improving query efficiency
Identify and address inefficiencies in data storage and access patterns
Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance
Develop alerting mechanisms and dashboards for proactive issue identification

Fulltime

Data Engineer (Production Support) for AWS EMR

We are seeking a highly skilled and motivated Data Engineer specializing in Prod...

Location

China , Shangai

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
Familiarity with data loading tools like Talend, Sqoop
Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie or Apache AirFlow
Strong knowledge of Shell Scripting, python or Java for scripting and automation
Familiarity with SQL and query optimization techniques
Experience in production support for large-scale distributed systems or data platforms
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios

Job Responsibility

Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
Investigate and debug data processing failures, latency issues, and performance bottlenecks
Provide support for mission-critical production systems as part of an on-call rotation
Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
Ensure effective resource utilization and cost optimization of clusters
Apply patches and upgrades to EMR clusters and software components as needed
Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift,Mysql or Snowflake
Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency

Fulltime

Data Engineer (Production Support) for AWS EMR

We are seeking a highly skilled and motivated Data Engineer specializing in Prod...

Location

China , Shangai

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
Familiarity with data loading tools like Talend, Sqoop
Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie or Apache AirFlow
Strong knowledge of Shell Scripting, python or Java for scripting and automation
Familiarity with SQL and query optimization techniques
Experience in production support for large-scale distributed systems or data platforms
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios

Job Responsibility

Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
Investigate and debug data processing failures, latency issues, and performance bottlenecks
Provide support for mission-critical production systems as part of an on-call rotation
Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
Ensure effective resource utilization and cost optimization of clusters
Apply patches and upgrades to EMR clusters and software components as needed
Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift,Mysql or Snowflake
Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency

New

Lead Data Engineer

Lead Data Engineer Do you love building and pioneering in the technology space?...

Location

United States , San Francisco; New York

Salary:

215200.00 - 245600.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

Bachelor's Degree
At least 4 years of experience in application development (Internship experience does not apply)
At least 2 years of experience in big data technologies
At least 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)
7+ years of experience in application development including Python, SQL, Scala, or Java
4+ years of experience with a public cloud (AWS, Microsoft Azure, Google Cloud)
4+ years experience with Distributed data/computing tools (MapReduce, Hadoop, Hive, EMR, Kafka, Spark, Gurobi, or MySQL)
4+ year experience working on real-time data and streaming applications
4+ years of experience with NoSQL implementation (Mongo, Cassandra)
4+ years of data warehousing experience (Redshift or Snowflake)

Job Responsibility

Collaborate with and across Agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies
Work with a team of developers with deep experience in machine learning, distributed microservices, and full stack systems
Utilize programming languages like Java, Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift and Snowflake
Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
Perform unit tests and conduct reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance

What we offer

Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
comprehensive, competitive, and inclusive set of health, financial and other benefits

Fulltime

Senior Software Engineer, Data

We're the world's leading sports technology company, at the intersection between...

Location

Austria , Vienna

Salary:

Not provided

Sportradar

Expiration Date

Until further notice

Requirements

5+ years of data engineering experience with proven track record of leading complex data projects from conception to delivery
Exceptional communication skills and experience working in cross-functional teams with analysts, product managers, and business stakeholders
AWS & Data Engineering: Very strong hands-on experience with AWS services (S3, Lambda, Glue, Athena, Redshift, EMR, etc.) and proficiency with Apache Spark for large-scale data processing
Backend Development: Strong experience with Python for building data processing services and APIs, plus expert-level SQL for data processing and analytics
Infrastructure & DevOps: Hands-on experience with Docker, Terraform, and CI/CD pipelines with automation best practices for data systems
Clean Code Advocate: Strong commitment to writing clean, maintainable, well-documented code with comprehensive testing and deep knowledge of analytics/reporting requirements
Data Architecture: Experience designing scalable data architectures, data modeling, and optimizing data processing workflows
Dashboard Development: Experience creating and managing analytics dashboards in bi tools (Tableau, Qlik Sense, Quicksuite, Power BI) and data visualization solutions to present complex insights to stakeholders

Job Responsibility

Scale & Performance Engineering: Processing and analyzing terabytes of advertising data with sub-second query performance while building and maintaining robust ETL pipelines using Spark and AWS services to handle massive data volumes daily
Data Pipeline Architecture & Development: Designing and building scalable data processing systems, developing backend APIs and microservices (Python or Go), architecting data flows that support both batch and real-time analytics requirements, and managing user-facing dashboards that visualize complex data insights
Infrastructure & Data Quality Operations: Implementing robust monitoring and alerting systems to detect data quality issues, managing AWS infrastructure using Terraform, implementing CI/CD best practices, and maintaining high coding standards across data processing systems
Cross-Functional Leadership & Collaboration: Leading large-scale data projects from requirements gathering through delivery, bridging technical implementation with business requirements, mentoring team members, and presenting technical concepts to stakeholders while challenging requirements constructively
End-to-End Data System Ownership: Taking complete ownership of complex data engineering projects while ensuring high availability and accuracy for both internal stakeholders and external clients, championing clean code principles, and serving as a knowledge leader who supports delivering the right data solutions

What we offer

A collaborative environment with colleagues from all over the world (Engineering offices in Europe, Asia and US)
Ability to shape your own workday and career via a clearly defined professional and personal development plan
Opportunity to work with senior leadership, develop yourself and build your career within an inspiring and fast-growing company and digital sports environment
A vibrant and inclusive community, including Women in Tech and Pride groups which welcome all participants
A company culture that promotes social aspects, sports, physical exercise and fun
Innovative and cross-team challenges like ShipIt, office sports tournaments in Darts and Table Tennis and unique beer brewing competitions
Competitive salary and benefits (e.g. retirement pension and insurance plan)
Sportradar takes over the full costs of € 365.- for the Öffi-Ticket (Jahreskarte) for you

Fulltime

Senior Data Engineer – Informatica

Whitehall Resources currently require an experienced Data Engineer to work with ...

Location

United Kingdom , London

Salary:

Not provided

Whitehall Resources Ltd

Expiration Date

Until further notice

Requirements

Hands on expertise in AWS & Spark: Amazon EMR, S3, Lambda
strong PySpark/Python and SQL for large scale batch processing
Data engineering at scale in government or similarly complex domains, including performance tuning and data quality management
CI/CD & DevOps: pipelines and IaC (e.g., Terraform), automated testing, and release governance
Version control & collaboration: Git/GitLab, code review, branching strategies, and trunk/PR workflows
APIs & integration: building/consuming data services to move and expose data safely and reliably
Agile ways of working with Jira/Confluence
clear stakeholder communication and concise technical documentation
Security clearance: BPSS (minimum) and SC cleared or SC clearable for UK government work.

Job Responsibility

Engineer production grade data pipelines on AWS (EMR, S3, Lambda), using PySpark/Python and SQL, with a focus on performance, resilience, testing, and observability
Migrate and modernise legacy workloads (e.g., ETL jobs and reporting feeds) onto cloud native services, creating reusable components and shared frameworks
Support reporting & MI use cases, including transformations and data models that feed downstream tools (e.g., Power BI)
Own CI/CD and version control practices (e.g., Git/GitLab), review code, and enforce engineering standards
Coach and mentor engineers, provide technical guidance/code reviews, and contribute to architectural decisions across squads
Work in Agile delivery, collaborating across product, data, and platform teams using Jira/Confluence
translate requirements into robust engineering tasks
Embed security and compliance by design, aligning with BPSS/SC constraints and department data handling policies

Fulltime

Data Engineer

Role: Data Engineer. Experience – 10 To 15 years. Fulltime Permanent FTE.

Location

United States , New York

Salary:

159000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

Hands-on experience in building ETL using Databricks SaaS infrastructure
Experience in developing data pipeline solutions to ingest and exploit new and existing data sources
Expertise in leveraging SQL, programming language like Python and ETL tools like Databricks
Perform code reviews to ensure requirements, optimal execution patterns and adherence to established standards
Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue)
Advanced understanding of Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services
Good understanding of AWS Identify and Access management, AWS Networking and AWS Monitoring tools
Proficiency in CI/CD and deployment automation using GITLAB pipeline
Proficiency in Cloud infrastructure provisioning tools e.g., Terraform
Proficiency in one or more programming languages e.g., Python, Scala

Job Responsibility

Work on migrating applications from an on-premises location to the cloud service providers
Develop products and services on the latest technologies through contributions in Development, enhancements, testing and implementation
Develop, modify, extend code for building cloud infrastructure, and automate using CI/CD pipeline
Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology
Perform problem analysis, data analysis, reporting, and communication
Work with peers across the system to define and implement best practices and standards
Assess applications and help determine the appropriate application infrastructure patterns
Use the best practices and knowledge of internal or external drivers to improve products or services

Fulltime

Lead Healthcare Data Engineer - Epic

We are looking for a lead data engineer highly collaborative and comfortable wit...

Location

United States , Remote

Salary:

126148.63 - 163993.22 USD / Year

Baptist Health

Expiration Date

Until further notice

Requirements

Bachelors degree in computer science or related
Master's degree preferred
Highly prefer Epic Certified in Caboodle and Clarity Data Models
Must possess strong understanding of Clarity Data Models and associated databases
Possess proven experience working with data warehouse architecture and solutions
At least 10 yrs with a Bachelors and at least 8 yrs with a Masters of recent exp in data engineering and end-to-end automation of data pipelines
Technical Skills: Data Warehousing, Strong SQL, and Python
Strong understanding of data science and business intelligence workflows
Programming exp ideally in Python & SQL
exp with large-scale data warehousing and analytics projects, including using AWS and GCP technologies

Job Responsibility

Partner closely with various teams to design, build and maintain data systems and applications
Work with various teams analyze and design data storage structures and build performant automated data pipelines that are reliable and scalable in a fast growing data ecosystem
Leverage strong understanding of data modeling principles and modern data platforms to properly design and implement data pipeline solutions
Provide production support and adhering to the defined SLA(s)

Fulltime

Select Country

Data Engineer (Production Support) for AWS EMR

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Data Engineer (Production Support) for AWS EMR

Data Engineer (Production Support) for AWS EMR

Data Engineer (Production Support) for AWS EMR

Data Engineer (Production Support) for AWS EMR

Lead Data Engineer

Senior Software Engineer, Data

Senior Data Engineer – Informatica

Data Engineer

Lead Healthcare Data Engineer - Epic

Our AI answers in your language