CrawlJobs Logo

Data Engineer (Production Support) for AWS EMR

China, Shangai · Job Posted April 23, 2026
Apply Position
Job Link Share

Job Description

We are seeking a highly skilled and motivated Data Engineer specializing in Production Support for AWS EMR (Elastic MapReduce) with spark, scala, Talend or any ETL tool knowledge to join our dynamic team. The ideal candidate will ensure the smooth operation, performance, and stability of large-scale distributed data processing pipelines and applications deployed on AWS EMR. This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.

Job Responsibility

  • Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
  • Investigate and debug data processing failures, latency issues, and performance bottlenecks
  • Provide support for mission-critical production systems as part of an on-call rotation
  • Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
  • Ensure effective resource utilization and cost optimization of clusters
  • Apply patches and upgrades to EMR clusters and software components as needed
  • Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
  • Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, Mysql or Snowflake
  • Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
  • Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency
  • Identify and address inefficiencies in data storage and access patterns
  • Providing optimal solutions for performance enhancement and fine tuning of current applications
  • Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance
  • Develop alerting mechanisms and dashboards for proactive issue identification
  • Provide daily/weekly monitoring reports on job status and alert on any long running/resource consuming issues
  • Collaborate with software developers, data scientists, and DevOps teams to resolve issues and optimize workflows
  • Maintain comprehensive documentation for troubleshooting guides, operational workflows, and best practices

Requirements

  • Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
  • Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
  • Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
  • Familiarity with data loading tools like Talend, Sqoop
  • Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
  • Knowledge of workflow/schedulers like Oozie or Apache AirFlow
  • Strong knowledge of Shell Scripting, python or Java for scripting and automation
  • Familiarity with SQL and query optimization techniques
  • Experience in production support for large-scale distributed systems or data platforms
  • Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
  • Implement data modelling concepts, methodologies to optimize data warehouse solutions
  • Manage detailed Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases
  • Strong analytical skills to debug complex systems and resolve performance bottlenecks
  • Effective communication skills to coordinate with cross-functional teams
  • A proactive and customer-focused attitude to provide excellent production support
  • Bachelor’s degree in computer science, Engineering, or a related field
  • 10+ years of experience with atleast 3-5 years on AWS Cloud platform experience in data engineering, production support, or a similar role

Nice to have

  • Experience with CI/CD tools like Jenkins or GitLab for pipeline deployments
  • Familiarity with container orchestration tools (e.g., Kubernetes, Docker)
  • Knowledge of data governance, security, and compliance in cloud environments
  • Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Data Engineer (Production Support) for AWS EMR

8 matching positions

Data Engineer (Production Support) for AWS EMR

The ideal candidate will ensure smooth operation, performance, and stability of ...
Location
Location
China , Shangai
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with Spark, Scala, Hive
  • Experience on Kafka, NiFi, various Amazon Web Service (AWS) tools
  • Familiarity with data loading tools like Talend
  • Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
  • Knowledge of workflow/schedulers like Oozie
  • Strong knowledge of Shell Scripting, python or Java for scripting and automation
  • Familiarity with SQL and query optimization techniques
  • Experience in production support & operations management
  • Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
  • 5 to 15 years total IT experience
Job Responsibility
Job Responsibility
  • Monitor data integration (data lake), troubleshoot, and resolve issues in real-time
  • Investigate and debug data processing failures and performance bottlenecks
  • Maintain and support ETL/ELT pipelines built on tools such as Spark, Scala, Hive and Glue
  • Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, MySQL or Snowflake
  • Perform root cause analysis, identify and analyze data discrepancies if any
  • Implement and monitor automated workflows using AWS tools
  • Analyze and optimize job performance by tuning Spark/Hive configurations and improving query efficiency
  • Identify and address inefficiencies in data storage and access patterns
  • Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance
  • Develop alerting mechanisms and dashboards for proactive issue identification
  • Fulltime
Read More
Arrow Right

Data Engineer (Production Support) for AWS EMR

We are seeking a highly skilled and motivated Data Engineer specializing in Prod...
Location
Location
China , Shangai
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
  • Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
  • Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
  • Familiarity with data loading tools like Talend, Sqoop
  • Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
  • Knowledge of workflow/schedulers like Oozie or Apache AirFlow
  • Strong knowledge of Shell Scripting, python or Java for scripting and automation
  • Familiarity with SQL and query optimization techniques
  • Experience in production support for large-scale distributed systems or data platforms
  • Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
Job Responsibility
Job Responsibility
  • Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
  • Investigate and debug data processing failures, latency issues, and performance bottlenecks
  • Provide support for mission-critical production systems as part of an on-call rotation
  • Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
  • Ensure effective resource utilization and cost optimization of clusters
  • Apply patches and upgrades to EMR clusters and software components as needed
  • Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
  • Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift,Mysql or Snowflake
  • Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
  • Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency
  • Fulltime
Read More
Arrow Right

Data Engineer (Production Support) for AWS EMR

We are seeking a highly skilled and motivated Data Engineer specializing in Prod...
Location
Location
China , Shangai
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
  • Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
  • Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
  • Familiarity with data loading tools like Talend, Sqoop
  • Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
  • Knowledge of workflow/schedulers like Oozie or Apache AirFlow
  • Strong knowledge of Shell Scripting, python or Java for scripting and automation
  • Familiarity with SQL and query optimization techniques
  • Experience in production support for large-scale distributed systems or data platforms
  • Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
Job Responsibility
Job Responsibility
  • Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
  • Investigate and debug data processing failures, latency issues, and performance bottlenecks
  • Provide support for mission-critical production systems as part of an on-call rotation
  • Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
  • Ensure effective resource utilization and cost optimization of clusters
  • Apply patches and upgrades to EMR clusters and software components as needed
  • Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
  • Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift,Mysql or Snowflake
  • Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
  • Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency
Read More
Arrow Right

Senior Software Engineer, Data

We're the world's leading sports technology company, at the intersection between...
Location
Location
Austria , Vienna
Salary
Salary:
Not provided
sportradar.com Logo
Sportradar
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of data engineering experience with proven track record of leading complex data projects from conception to delivery
  • Exceptional communication skills and experience working in cross-functional teams with analysts, product managers, and business stakeholders
  • AWS & Data Engineering: Very strong hands-on experience with AWS services (S3, Lambda, Glue, Athena, Redshift, EMR, etc.) and proficiency with Apache Spark for large-scale data processing
  • Backend Development: Strong experience with Python for building data processing services and APIs, plus expert-level SQL for data processing and analytics
  • Infrastructure & DevOps: Hands-on experience with Docker, Terraform, and CI/CD pipelines with automation best practices for data systems
  • Clean Code Advocate: Strong commitment to writing clean, maintainable, well-documented code with comprehensive testing and deep knowledge of analytics/reporting requirements
  • Data Architecture: Experience designing scalable data architectures, data modeling, and optimizing data processing workflows
  • Dashboard Development: Experience creating and managing analytics dashboards in bi tools (Tableau, Qlik Sense, Quicksuite, Power BI) and data visualization solutions to present complex insights to stakeholders
Job Responsibility
Job Responsibility
  • Scale & Performance Engineering: Processing and analyzing terabytes of advertising data with sub-second query performance while building and maintaining robust ETL pipelines using Spark and AWS services to handle massive data volumes daily
  • Data Pipeline Architecture & Development: Designing and building scalable data processing systems, developing backend APIs and microservices (Python or Go), architecting data flows that support both batch and real-time analytics requirements, and managing user-facing dashboards that visualize complex data insights
  • Infrastructure & Data Quality Operations: Implementing robust monitoring and alerting systems to detect data quality issues, managing AWS infrastructure using Terraform, implementing CI/CD best practices, and maintaining high coding standards across data processing systems
  • Cross-Functional Leadership & Collaboration: Leading large-scale data projects from requirements gathering through delivery, bridging technical implementation with business requirements, mentoring team members, and presenting technical concepts to stakeholders while challenging requirements constructively
  • End-to-End Data System Ownership: Taking complete ownership of complex data engineering projects while ensuring high availability and accuracy for both internal stakeholders and external clients, championing clean code principles, and serving as a knowledge leader who supports delivering the right data solutions
What we offer
What we offer
  • A collaborative environment with colleagues from all over the world (Engineering offices in Europe, Asia and US)
  • Ability to shape your own workday and career via a clearly defined professional and personal development plan
  • Opportunity to work with senior leadership, develop yourself and build your career within an inspiring and fast-growing company and digital sports environment
  • A vibrant and inclusive community, including Women in Tech and Pride groups which welcome all participants
  • A company culture that promotes social aspects, sports, physical exercise and fun
  • Innovative and cross-team challenges like ShipIt, office sports tournaments in Darts and Table Tennis and unique beer brewing competitions
  • Competitive salary and benefits (e.g. retirement pension and insurance plan)
  • Sportradar takes over the full costs of € 365.- for the Öffi-Ticket (Jahreskarte) for you
  • Fulltime
Read More
Arrow Right

Senior Data Engineer – Informatica

Whitehall Resources currently require an experienced Data Engineer to work with ...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
whitehallresources.com Logo
Whitehall Resources Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands on expertise in AWS & Spark: Amazon EMR, S3, Lambda
  • strong PySpark/Python and SQL for large scale batch processing
  • Data engineering at scale in government or similarly complex domains, including performance tuning and data quality management
  • CI/CD & DevOps: pipelines and IaC (e.g., Terraform), automated testing, and release governance
  • Version control & collaboration: Git/GitLab, code review, branching strategies, and trunk/PR workflows
  • APIs & integration: building/consuming data services to move and expose data safely and reliably
  • Agile ways of working with Jira/Confluence
  • clear stakeholder communication and concise technical documentation
  • Security clearance: BPSS (minimum) and SC cleared or SC clearable for UK government work.
Job Responsibility
Job Responsibility
  • Engineer production grade data pipelines on AWS (EMR, S3, Lambda), using PySpark/Python and SQL, with a focus on performance, resilience, testing, and observability
  • Migrate and modernise legacy workloads (e.g., ETL jobs and reporting feeds) onto cloud native services, creating reusable components and shared frameworks
  • Support reporting & MI use cases, including transformations and data models that feed downstream tools (e.g., Power BI)
  • Own CI/CD and version control practices (e.g., Git/GitLab), review code, and enforce engineering standards
  • Coach and mentor engineers, provide technical guidance/code reviews, and contribute to architectural decisions across squads
  • Work in Agile delivery, collaborating across product, data, and platform teams using Jira/Confluence
  • translate requirements into robust engineering tasks
  • Embed security and compliance by design, aligning with BPSS/SC constraints and department data handling policies
  • Fulltime
Read More
Arrow Right

Data Engineer

Role: Data Engineer. Experience – 10 To 15 years. Fulltime Permanent FTE.
Location
Location
United States , New York
Salary
Salary:
159000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience in building ETL using Databricks SaaS infrastructure
  • Experience in developing data pipeline solutions to ingest and exploit new and existing data sources
  • Expertise in leveraging SQL, programming language like Python and ETL tools like Databricks
  • Perform code reviews to ensure requirements, optimal execution patterns and adherence to established standards
  • Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue)
  • Advanced understanding of Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services
  • Good understanding of AWS Identify and Access management, AWS Networking and AWS Monitoring tools
  • Proficiency in CI/CD and deployment automation using GITLAB pipeline
  • Proficiency in Cloud infrastructure provisioning tools e.g., Terraform
  • Proficiency in one or more programming languages e.g., Python, Scala
Job Responsibility
Job Responsibility
  • Work on migrating applications from an on-premises location to the cloud service providers
  • Develop products and services on the latest technologies through contributions in Development, enhancements, testing and implementation
  • Develop, modify, extend code for building cloud infrastructure, and automate using CI/CD pipeline
  • Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology
  • Perform problem analysis, data analysis, reporting, and communication
  • Work with peers across the system to define and implement best practices and standards
  • Assess applications and help determine the appropriate application infrastructure patterns
  • Use the best practices and knowledge of internal or external drivers to improve products or services
  • Fulltime
Read More
Arrow Right

Lead Healthcare Data Engineer - Epic

We are looking for a lead data engineer highly collaborative and comfortable wit...
Location
Location
United States , Remote
Salary
Salary:
126148.63 - 163993.22 USD / Year
baptisthealth.net Logo
Baptist Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors degree in computer science or related
  • Master's degree preferred
  • Highly prefer Epic Certified in Caboodle and Clarity Data Models
  • Must possess strong understanding of Clarity Data Models and associated databases
  • Possess proven experience working with data warehouse architecture and solutions
  • At least 10 yrs with a Bachelors and at least 8 yrs with a Masters of recent exp in data engineering and end-to-end automation of data pipelines
  • Technical Skills: Data Warehousing, Strong SQL, and Python
  • Strong understanding of data science and business intelligence workflows
  • Programming exp ideally in Python & SQL
  • exp with large-scale data warehousing and analytics projects, including using AWS and GCP technologies
Job Responsibility
Job Responsibility
  • Partner closely with various teams to design, build and maintain data systems and applications
  • Work with various teams analyze and design data storage structures and build performant automated data pipelines that are reliable and scalable in a fast growing data ecosystem
  • Leverage strong understanding of data modeling principles and modern data platforms to properly design and implement data pipeline solutions
  • Provide production support and adhering to the defined SLA(s)
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Data Infrastructure

At Docker, we make app development easier so developers can focus on what matter...
Location
Location
United States , Seattle
Salary
Salary:
195400.00 - 275550.00 USD / Year
docker.com Logo
Docker
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience with 3+ years focused on data engineering and analytics systems
  • Expert-level experience with Snowflake including advanced SQL, performance optimization, and cost management
  • Deep proficiency in DBT for data modeling, transformation, and testing with experience in large-scale implementations
  • Strong expertise with Apache Airflow for complex workflow orchestration and pipeline management
  • Hands-on experience with Sigma or similar modern BI platforms for self-service analytics
  • Extensive AWS experience including data services (S3, Redshift, EMR, Glue, Lambda, Kinesis) and infrastructure management
  • Proficiency in Python, SQL, and other programming languages commonly used in data engineering
  • Experience with infrastructure-as-code, CI/CD practices, and modern DevOps tools
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • Proven track record designing and implementing large-scale distributed data systems
Job Responsibility
Job Responsibility
  • Define and drive the technical strategy for Docker's data platform architecture, establishing long-term vision for scalable data systems
  • Lead design and implementation of highly scalable data infrastructure leveraging Snowflake, AWS, Airflow, DBT, and Sigma
  • Architect end-to-end data pipelines supporting real-time and batch analytics across Docker's product ecosystem
  • Drive technical decision-making around data platform technologies, architectural patterns, and engineering best practices
  • Establish technical standards for data quality, testing, monitoring, and operational excellence
  • Design and build robust, scalable data systems that process petabytes of data and support millions of user interactions
  • Implement complex data transformations and modeling using DBT for analytics and business intelligence use cases
  • Develop and maintain sophisticated data orchestration workflows using Apache Airflow
  • Optimize Snowflake performance and cost efficiency while ensuring reliability and scalability
  • Build data APIs and services that enable self-service analytics and integration with downstream systems
What we offer
What we offer
  • Freedom & flexibility
  • fit your work around your life
  • Designated quarterly Whaleness Days plus end of year Whaleness break
  • Home office setup
  • we want you comfortable while you work
  • 16 weeks of paid Parental leave
  • Technology stipend equivalent to $100 net/month
  • PTO plan that encourages you to take time to do the things you enjoy
  • Training stipend for conferences, courses and classes
  • Equity
  • Fulltime
Read More
Arrow Right