This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.
Job Responsibility:
Design, develop, and implement efficient and scalable data pipelines and ETL processes using Pyspark and Python
Develop, optimize, and maintain complex SQL queries and stored procedures within Oracle database environments
Collaborate with data architects, data scientists, and other stakeholders to understand data requirements and translate them into technical solutions
Perform data analysis, profiling, and quality checks to ensure data accuracy and integrity
Optimize Pyspark and Python code for performance and efficiency on large datasets
Troubleshoot and resolve data-related issues, ensuring data availability and reliability
Participate in code reviews, testing, and deployment processes
Stay up-to-date with emerging technologies and best practices in data engineering and big data
Document technical designs, data flows, and code
Requirements:
Bachelor's degree in Computer Science, Engineering, Information Technology, or a related field
3+ years of experience in data engineering or software development with a focus on Pyspark/Python
Proven expertise in developing data processing applications using Pyspark and Python
Strong proficiency in writing complex SQL queries, stored procedures, and understanding database schema in Oracle
Experience with Apache Spark and its ecosystem for big data processing
Solid understanding and experience with ETL methodologies and tools
Experience with job scheduling and automation tools, such as Autosys
Proficiency in Unix/Linux shell scripting for automation and system tasks
Experience with Git or similar version control systems
Excellent analytical and problem-solving skills with attention to detail
Nice to have:
Experience with other database systems (e.g., PostgreSQL, SQL Server)
Knowledge of cloud platforms (AWS, Azure, GCP) and their data services (e.g., AWS S3, EMR, Glue)
Familiarity with data orchestration tools (e.g., Apache Airflow)
Experience with data visualization tools (e.g., Tableau, Power BI)
Understanding of data warehousing concepts (e.g., Kimball, Inmon)