This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are assembling an A-team of highly skilled, autonomous, and AI-first engineers, and we are seeking an exceptional Full Stack Data Engineer to join our high-performing, co-located squads in Canada. This role is for a hands-on engineer who is passionate about leveraging data, proficient in building end-to-end data solutions, and deeply committed to using AI tools to maximize productivity. The ideal candidate will be instrumental in designing, developing, and optimizing robust data pipelines, from ingestion to consumption, using Python, PySpark, and other big data technologies. We are looking for an AI-first thinker who can profoundly understand the functional domains our work impacts, and significantly contribute to our data strategy and culture.
Job Responsibility
Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, security, and compliance across the data lifecycle
Collaborate closely within small, co-located squads (4-7 person teams), fostering an environment of high communication and minimal coordination overhead, to deliver impactful data products
Develop, maintain, and optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for large-scale datasets
Implement sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (e.g., HDFS, S3), and enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB)
Design and implement scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability
Engage effectively with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation
Implement real-time data streaming and complex event-driven architectures using technologies like Apache Kafka, ensuring low-latency data availability for critical business functions
Adhere to and contribute to best practices in data engineering and software development, participating in rigorous code reviews, implementing comprehensive automated testing strategies, and supporting robust CI/CD pipelines within a DevOps culture
Exhibit High Autonomy and Agency, taking ownership of technical challenges, making well-reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape
Innovate with AI-Powered Development, actively leveraging, integrating, and contributing to AI coding tools (e.g., internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to significantly enhance productivity, code quality, and development velocity, and inspiring others to do the same
Participate in technical discussions and contribute to the evolution of our big data technology stack, evaluating new technologies, and making strategic recommendations that align with business objectives and architectural vision
Expertly Troubleshoot and Resolve challenging technical issues within complex, distributed big data environments, applying advanced analytical and problem-solving methodologies
Requirements
Experience: 4+ years of progressive, hands-on experience as a Data Engineer, with a proven track record of delivering complex, large-scale data solutions
Expert-level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production-grade PySpark applications for mission-critical data processing
Deep understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming)
Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex queries
Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
Proficiency in SQL, complex query optimization, and advanced data warehousing concepts (e.g., dimensional modeling, data vault, data lakes)
Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase) and understanding of their architectural trade-offs
Expert-level experience with Apache Kafka, including design and implementation of high-throughput, low-latency real-time data pipelines and event-driven architectures
Extensive experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud-native architectural patterns
Mandatory: Demonstrated mastery and innovative application of AI coding tools (e.g., Claude Code, Codex, Antigravity) to significantly enhance the development lifecycle
A proactive, 'AI-first thinker' mindset, with a proven ability to evaluate, integrate, and evangelize new AI tools and methodologies within the team to drive continuous improvement and innovation
Expert ability to articulate the intricacies of the functional domain, proactively identifying business challenges and opportunities, and translating them into impactful, data-driven solutions
Advanced understanding of software engineering principles, design patterns, data structures, algorithms, and performance engineering for distributed systems
Extensive experience with RESTful API design, development, and integration for data services
Strong expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration for deploying and managing scalable data applications
Master-level proficiency with version control systems, especially Git, including advanced branching, merging, and code review strategies
Exceptional problem-solving, analytical, and debugging skills applied to highly complex, distributed big data ecosystems
Superior communication, presentation, and interpersonal skills, with the ability to articulate complex technical concepts to diverse audiences and influence strategic decisions
Demonstrated high autonomy and agency in driving strategic initiatives and delivering impactful, innovative data solutions
Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related quantitative field is required. Equivalent advanced practical experience with a demonstrable track record of architecting and delivering major data initiatives will also be considered