This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Applications Development Technology Lead Analyst is a senior level position responsible for establishing and implementing new or revised application systems and programs in coordination with the Technology team. The overall objective of this role is to lead applications systems analysis and programming activities.
Job Responsibility:
Design and implement scalable, fault-tolerant batch and real-time data processing pipelines
Develop robust data models and schema designs optimized for both performance and storage efficiency
Evaluate and integrate emerging tools and frameworks (e.g., Spark, Flink, Kafka) into the existing stack
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Legacy Systems Decommissioning: Lead the strategic migration of data and logic from legacy platforms (e.g. on-premises SQL Servers) to a modern Data Lakehouse environment
ETL/ELT Transformation: Re-engineer existing stored procedures and complex legacy ETL jobs into scalable, distributed processing frameworks using Spark (Python) and Starburst/Trino
Validation & Parity Testing: Design and implement automated frameworks for Data Parity Testing to ensure 100% accuracy and consistency between legacy outputs and new big data results
Schema Evolution: Map and transform rigid, legacy relational schemas into flexible, high-performance formats optimized for the cloud (e.g., Parquet, Avro, or Iceberg)
Phased Cutover Management: Orchestrate a phased migration strategy (Parallel Run, Shadow Execution) to ensure zero downtime for downstream business applications and reporting tools
Performance Benchmarking: Establish performance baselines on legacy systems and ensure the new Big Data architecture meets or exceeds those benchmarks at scale
Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
Write clean, high-performance code in Python
Optimize complex SQL queries and fine-tune distributed computing clusters to reduce latency and costs
Ensure data integrity and security by implementing rigorous validation and encryption standards
Build and maintain CI/CD pipelines for automated testing and deployment of data jobs
Monitor system health and troubleshoot performance bottlenecks across the data lifecycle
Provide technical mentorship and conduct code reviews for junior and mid-level engineers
Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
Translate complex business requirements into technical specifications
Collaborate with Product Managers to ensure data availability for downstream analytics, business models and users
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
Requirements:
Highly experienced and skilled technical lead with 12+years of experience with software building and platform engineering
Experience in Data Engineering, focused on Big Data ecosystems
Knowledge in Hadoop, YARN, Hive, Impala, Spark, and Spark SQL with extensive high volume of data processing pipeline development
Programming Expert level and hand on experience in Python
Familiarity with data formats like Avro, Parquet, CSV, JSON
Hands-on experience in writing SQL queries
Highly experienced with Unix based operating systems and shell scripting
Experience with source code management tools such as Bitbucket, Git etc
Big Data Tech Proficiency and hands-on in Hadoop, Spark, Hive, Kafka, and NoSQL databases (MongoDB, HBase)
Experience working with query engines like Trino, Presto, Starburst
Strong computer science fundamentals in data structures, algorithms, databases, and operating systems
Reverse Engineering, ability to read "spaghetti" SQL or old scripts and document the business logic before moving it
Data Lineage, Experience using tools (like Collibra or Informatica) to track where data comes from and where it’s going
Change Management, Experience managing the technical "shock" to the business when switching from legacy BI tools to modern query engines like Starburst
Nice to have:
Problem Solver: You don't just fix bugs
you identify the root cause to prevent recurrence
Communicator: You can explain the "why" behind a technical decision to non-technical stakeholders
Automation and AI Mindset: You believe that if a task has to be done twice, it should be automated. Familiarity with AI tools to expedite deliveries