CrawlJobs Logo

Data Analytics Intermediate Engineer

https://www.citi.com/ Logo

Citi

Location Icon

Location:
United States, Irving

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

76230.00 - 106370.00 USD / Year

Job Description:

The Data Analytics Intermediate Engineer is a hands-on technical contributor who designs, builds, and maintains scalable data pipelines and infrastructure within large enterprise data environments. This role supports various analytical and operational needs, collaborating with cross-functional teams and applying solid knowledge of big data technologies, programming, and data governance practices.

Job Responsibility:

  • design and implement data ingestion, transformation, and cleansing pipelines using PySpark, SQL, and Python/Java
  • work on structured and unstructured datasets stored in HDFS, Hive, Parquet, or cloud-based storage
  • optimize existing data workflows and jobs for performance, scalability, and reliability
  • support batch and streaming data processing frameworks across Big Data platforms (e.g., Hadoop, Spark, Hive, Kafka)
  • integrate and process data from multiple sources including APIs, flat files, relational databases, and cloud-native services
  • apply data modeling, partitioning, and file format best practices for efficient storage and querying
  • implement monitoring, logging, and alerting for production pipelines and participate in on-call rotation if required
  • document pipeline logic, data lineage, and schema changes to ensure data transparency and auditability
  • collaborate with data analysts, data scientists, and product owners to translate business needs into scalable data solutions
  • assist in proof-of-concept efforts for new technologies and data integration strategies

Requirements:

  • 2–5 years of experience in a data engineering, ETL development, or big data role
  • strong programming experience in Python (or Java) for data manipulation and automation
  • advanced proficiency in SQL (window functions, joins, CTEs, optimization techniques)
  • experience working with Apache Spark (PySpark) in a distributed environment
  • hands-on with Hadoop ecosystem tools (Hive, HDFS, Oozie, etc.)
  • familiarity with Git, Jenkins, Airflow, or other CI/CD and orchestration tools
  • exposure to cloud platforms (AWS Glue/EMR, Azure Data Factory, GCP Dataflow) is a plus
  • knowledge of basic ML workflows (feature engineering, model inputs/outputs) is desirable but not mandatory

Nice to have:

  • exposure to cloud platforms (AWS Glue/EMR, Azure Data Factory, GCP Dataflow)
  • knowledge of basic ML workflows (feature engineering, model inputs/outputs)
What we offer:
  • medical, dental & vision coverage
  • 401(k)
  • life, accident, and disability insurance
  • wellness programs
  • paid time off packages including planned time off (vacation), unplanned time off (sick leave), and paid holidays
  • discretionary and formulaic incentive and retention awards

Additional Information:

Job Posted:
April 29, 2025

Expiration:
May 05, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.