This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are building an A-team of highly skilled, autonomous, and AI-first engineers, and we are looking for an ambitious Full Stack Data Engineer to join our focused squads in Pune. This role is designed for a hands-on engineer who is passionate about leveraging data, proficient in building end-to-end data solutions, and deeply committed to using AI tools to maximize productivity. The ideal candidate will be instrumental in designing, developing, and optimizing robust data pipelines, from ingestion to consumption, using Python, PySpark, and other big data technologies. We seek an individual with strong domain understanding who can contribute to our AI-first culture and help shape the future of our data platforms.
Job Responsibility
Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, and security across the data lifecycle.
Collaborate closely within small, co-located squads (4-7 person teams), fostering high communication and low coordination overhead, to translate complex business requirements into technical specifications for data engineering solutions.
Develop, maintain, and optimize data ingestion, processing, and transformation pipelines using Python and PySpark for large-scale datasets.
Implement data storage solutions using big data technologies such as Hive, distributed file systems (e.g., HDFS, S3), and potentially NoSQL databases.
Design and implement data models and schemas optimized for analytics and reporting, ensuring data integrity and accessibility.
Work with data consumers (e.g., analysts, data scientists) to understand their needs and provide efficient access to processed data, potentially involving reporting tools like Tableau.
Implement and manage real-time data streaming and event-driven architectures using technologies like Apache Kafka.
Champion best practices in data engineering and software development, including rigorous code reviews, implementing comprehensive testing, and supporting continuous integration and continuous deployment (CI/CD) pipelines.
Demonstrate high autonomy and agency in driving data projects forward, making informed technical decisions, and proactively identifying areas for data quality and efficiency improvements.
Proactively leverage and contribute to the development of AI-powered development tools, including internal Citi AI tools like Copilot, Claude Code, Codex, and Antigravity, to significantly enhance productivity, code quality, and accelerate development cycles.
Participate in technical discussions and contribute to the evolution of our big data technology stack, always seeking innovative approaches to data challenges.
Troubleshoot and resolve complex technical issues within data pipelines and big data environments, demonstrating strong analytical and problem-solving skills.
Requirements
Experience: 4-5 years of hands-on experience as a Data Engineer, with a strong focus on building end-to-end data solutions and big data technologies.
Expert proficiency in Python, with proven experience in developing scalable data processing applications.
Strong understanding and hands-on experience with Apache Spark, particularly PySpark, for large-scale data processing.
Solid experience with Hive for data warehousing and querying large datasets.
Familiarity with distributed computing fundamentals and components like HDFS.
Proficiency in SQL and experience with data warehousing concepts.
Experience with data storage formats (e.g., Parquet, ORC, Avro) and cloud-based data lake solutions (e.g., S3).
Experience with Apache Kafka for building real-time data pipelines and event-driven architectures.
Proven effectiveness with AI coding tools (e.g., Claude Code, Codex, Antigravity) is expected
a strong willingness to adopt and maximize their usage is essential.
An "AI-first thinker" mindset, demonstrating how to leverage and integrate AI tools into the development workflow for continuous improvement.
Strong ability to articulate the functional domain being worked in, understanding the business context, and explaining the "why" behind the technical data solutions.
Strong understanding of data structures, algorithms, and performance optimization techniques for large-scale data processing.
Experience with RESTful API design and development for data ingestion or exposure points.
Expert proficiency with version control systems, especially Git.
Exceptional problem-solving, analytical, and debugging skills in complex, distributed data environments.
Superior communication and interpersonal skills, with the ability to work effectively and autonomously within small, high-performing teams, and to collaborate with various stakeholders.
Demonstrated high autonomy and agency in tackling complex challenges and delivering impactful data solutions.
Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related quantitative field is required. Equivalent practical experience with a demonstrable track record of excellence will also be considered.
Nice to have
Knowledge of data visualization tools like Tableau is beneficial but not mandatory.
Familiarity with containerization technologies (e.g., Docker, Kubernetes) for deploying data applications is a plus.