This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Big Data Developer is a senior level position responsible for establishing and implementing scalable, efficient big data application systems and platforms—primarily across Hadoop/Spark and cloud environments—in coordination with the Technology team. The overall objective of this role is to lead big data systems analysis, data engineering, and applications programming activities.
Job Responsibility:
Partner with multiple management teams to ensure appropriate integration of functions to meet goals, and to identify and define necessary platform and system enhancements to deploy new data products and process improvements
Design and implement scalable and efficient Hadoop architecture solutions encompassing core ecosystem components, including HDFS, YARN, MapReduce, Hive, HBase, and Spark
Collaborate with data engineers, data scientists, and analytics stakeholders to understand data requirements and deliver robust, reliable pipelines and analytical datasets
Develop Spark/PySpark solutions to support near real-time data ingestion, analytics, and reporting, ensuring high performance and reliability
Optimize Hadoop and Spark clusters for performance and resource utilization, including capacity planning, tuning, and job orchestration best practices
Maintain and monitor Hadoop infrastructure to ensure high availability, reliability, and observability
implement proactive alerting, logging, and issue resolution
Implement and enforce data security and governance policies (e.g., access controls, encryption, data quality, lineage, and cataloging) across big data platforms
Troubleshoot and resolve issues across the Hadoop ecosystem (jobs, services, resource management), driving root-cause analysis and permanent fixes
Provide expertise in the area and advanced knowledge of applications programming, ensuring application and data solution design adheres to the overall architecture blueprint and cloud reference patterns
Utilize advanced knowledge of system flow to develop standards for coding, testing, debugging, deployment, and implementation—leveraging Python, PySpark, Unix/Linux, and SQL
Develop comprehensive knowledge of how architecture, data platforms, and infrastructure integrate to accomplish business goals, including data modeling, ETL processes, data warehousing, and cloud-native services (AWS, Azure, Google Cloud)
Provide in-depth analysis with interpretive thinking to define issues and develop innovative, scalable solutions aligned with business and regulatory requirements
Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary and uplifting engineering practices through code reviews and mentorship
Stay updated with the latest advancements in Hadoop/big data technologies and related areas
evaluate and introduce improvements, including AI/ML lifecycle management, MLOps, and GenAI-adjacent integrations where appropriate
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm’s reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
Requirements:
6+ years of relevant experience in Big Data/Application Development or systems analysis roles, including building and operating production-grade data pipelines on Hadoop/Spark
Extensive experience in system analysis and in programming of big data applications and data platforms
Proven experience designing and managing Hadoop-based architectures, including cluster configuration, resource management (YARN), and ecosystem integration
Strong understanding and hands-on expertise with the Hadoop ecosystem: HDFS, YARN, MapReduce, Hive, HBase, and Spark
Strong hands-on and architectural knowledge of Python, PySpark, Unix/Linux, and SQL
Experience with data modeling, ETL processes, and data warehousing concepts and implementation
Experience implementing data security and governance (e.g., RBAC, encryption, data quality, data lineage, catalog)
Exposure to AI/ML lifecycle management, MLOps, and GenAI solution patterns and integration points
Experience with major cloud platforms—AWS, Azure, Google Cloud—and related big data services (e.g., EMR, HDInsight, Dataproc, Databricks)
Subject Matter Expert (SME) in at least one area of Big Data/Application Development (e.g., Spark performance tuning, Hive optimization, HBase administration, data security)
Experience in managing and implementing successful projects
demonstrated leadership and project management skills
Ability to adjust priorities quickly as circumstances dictate
Consistently demonstrates clear and concise written and verbal communication