This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Citi is seeking a highly skilled and experienced Senior Data Engineer to join our dynamic and innovative technology team. The ideal candidate will have a robust background in data engineering, with deep expertise in a variety of modern data technologies and a proven track record of working on large-scale data projects. This role will be pivotal in designing, building, and optimizing our data infrastructure on cloud platforms, and will also provide exposure to cutting-edge Artificial Intelligence projects, including Retrieval-Augmented Generation (RAG) and Agentic AI systems. The candidate must be proficient in Agile methodologies and possess strong leadership and client-facing skills to guide projects to successful completion while balancing stakeholder needs and organizational goals.
Job Responsibility:
Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks, ensuring efficient ingestion, transformation, and integration of large-scale datasets across cloud platforms.
Cloud Data Platform Management: Implement and manage data solutions on cloud platforms (e.g., AWS, GCP, Azure). Leverage cloud-native services for data storage, processing, and analytics.
Big Data Technologies: Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg to process and analyze petabyte-scale datasets.
Optimize Spark workloads and Databricks clusters by tuning jobs, managing partitioning strategies, caching, and autoscaling to improve performance, reduce processing time, and control infrastructure costs.
Implement and manage Lakehouse architecture using Delta Lake, enforcing data quality, schema evolution, and governance (e.g., Unity Catalog), while ensuring reliable, secure, and high-quality data for analytics and downstream applications.
Lead the design and architecture of Starburst-based data solutions, ensuring scalability, performance, and reliability for enterprise-level data platforms.
Implement and manage data federation strategies using Starburst connectors to seamlessly integrate and query data across disparate systems (e.g., Data Lakes, RDBMS, NoSQL databases, Cloud Storage).
Performance Optimization: Identify and resolve performance bottlenecks in data pipelines and queries. Optimize data storage and processing for cost and efficiency.
Develop and optimize robust data pipelines with a strong focus on data governance, ensuring high data quality, comprehensive data lineage, and efficient, compliant data flow from ingestion to consumption for analytical and operational needs.
Data Modeling and Architecture: Design and implement data models that support business intelligence, analytics, and machine learning use cases. Ensure data architecture is robust, scalable, and secure.
AI and Machine Learning Collaboration: Partner with data scientists and AI specialists to support the development and deployment of AI models. Contribute to innovative projects involving RAG and Agentic AI by providing the necessary data infrastructure and support.
Agile Methodology: Operate effectively within an Agile development environment, actively participating in sprint planning, daily stand-ups, and retrospectives to ensure iterative and timely delivery of project milestones.
Leadership and Project Guidance: Provide technical leadership to steer the project in the right direction, making critical decisions that align with both client interests and the organization's strategic benefits. Mentor junior engineers and promote best practices.
Stakeholder and Client Interaction: Serve as a key point of contact for stakeholders and clients. Effectively communicate project progress, manage expectations, and translate complex business requirements into actionable technical tasks.
Requirements:
Python: Expert-level proficiency with Python and its data ecosystem (e.g., Pandas, NumPy, Dask). Experience should include writing production-grade code for data processing, automation, and API development.
PySpark: Extensive hands-on experience with the Spark framework, including deep knowledge of the DataFrame API, Spark SQL, and performance tuning techniques for distributed data processing.
Databricks: Proven experience developing on the Databricks Lakehouse Platform, including proficiency with Delta Lake, structured streaming, and optimizing Spark jobs within the Databricks environment.
Ab Initio: Strong, practical experience with the Ab Initio suite of products (GDE, Co>Operating System, Conduct>It) for designing and implementing enterprise-grade ETL workflows.
Snowflake: Hands-on experience designing, building, and maintaining data warehouses in Snowflake. This includes data modeling, implementing security (RBAC), performance tuning, and utilizing features like Snowpipe and Time Travel.
Starburst/Trino: Experience using federated query engines to provide unified access across disparate data sources. Should understand the principles of query federation and have experience connecting to various data systems.
Apache Iceberg: Familiarity or experience with open table formats like Apache Iceberg for managing large analytic datasets.
In-depth knowledge and multi-year experience with at least one major cloud provider (AWS, Google Cloud Platform, or Azure).
Practical experience building and managing data pipelines using cloud-native services such as AWS Glue, Lambda, S3, Redshift
Azure Data Factory, Synapse Analytics
or Google Cloud Composer, Dataflow, and BigQuery.
A solid understanding of the data lifecycle required for machine learning projects.
Experience in building data pipelines to support AI/ML models. Exposure to or a strong interest in preparing data for advanced AI applications, such as building ingestion and transformation pipelines for vector databases used in Retrieval-Augmented Generation (RAG) and Agentic AI systems.
Agile Proficiency: Deep familiarity with Agile and Scrum methodologies, with a proven ability to deliver projects iteratively and adapt to changing requirements.
Leadership & Influence: Demonstrated ability to provide technical leadership, influence architectural decisions, and steer projects towards successful outcomes that align with both client needs and long-term organizational strategy.
Client Engagement: Exceptional communication and interpersonal skills, with proven proficiency in client interaction. Must be able to articulate complex technical concepts to diverse audiences and build strong stakeholder relationships.
6-10 years of hands-on experience in data engineering, preferably within a large-scale enterprise or financial services environment.
Demonstrable experience leading project work streams and mentoring junior team members.
Relevant industry certifications (e.g., AWS Certified Big Data, Google Professional Data Engineer, Snowflake SnowPro).
Experience with containerization technologies like Docker and orchestration tools like Kubernetes.
Deep understanding of data governance, data quality, and data security principles.
Excellent analytical and problem-solving skills with the ability to work independently or as part of a team.
Experience as Applications Development Manager
Experience as senior level in an Applications Development role
Stakeholder and people management experience
Demonstrated leadership skills
Proven project management skills
Basic knowledge of industry practices and standards
Consistently demonstrates clear and concise written and verbal communication
Bachelor's degree/University degree or equivalent experience
What we offer:
medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays