This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a bright and dynamic engineer, motivated and able to work independently as well as in partnership with IT and Business teams spread across the globe. The candidate needs to be an exceptionally strong Python and SQL programmer with hands-on experience in GCP-native data technologies including BigQuery, Dataproc, Cloud Composer, and Datastream. Besides technical skills, we are looking for a candidate with a strong sense of ownership and the ability to work in a diverse, cross-functional team spanning Engineering, Research, DataOps, and Compliance.
Job Responsibility:
Build and maintain scalable, distributed, fault-tolerant data pipelines on GCP, including BigQuery-based lakehouse layers and Dataproc-driven Delta Lake workflows
Actively participate in meetings with various stakeholders across data engineering, compliance, and business teams globally
Understand market data processing and transformation needs
build pipelines to acquire, normalise, transform, and release large volumes of financial data through the OMDP data factory
Design and implement bitemporal data models (valid-time + system-time) on BigQuery to support certified, regulatory-grade time-series datasets
Build, use, and maintain software testing frameworks (unit / non-regression / user acceptance) for data pipelines and transformation logic
Take complete ownership of solutions and assigned tasks, including ingestion pipelines, QA workflows, correction management, and audit trail implementation.
Work in a collaborative manner with other team members and contribute to shared platform services rather than vertical-specific implementations
Have business acumen to understand financial concepts around reference data related to equities and other asset classes
Support teams across data and technology in implementing AI solutions and integrating their services with MSCI's data science products and platforms, including AI-assisted ingestion, anomaly detection, and semantic search over the lakehouse using Vertex AI
Requirements:
6-8 years of experience in data engineering
Proficient in Python programming — data pipeline development, transformation logic, and automation scripts
Proficient in data query and analysis using SQL, with strong hands-on experience in BigQuery — partitioning, clustering, materialised views, and time-series query patterns at scale
Hands-on experience building and scheduling pipelines using Cloud Composer (Apache Airflow) — DAG authoring, SLA alerting, retry logic, and dependency management
Working knowledge of Dataproc (Apache Spark) — batch ingestion, Delta Lake merge operations, and incremental data processing
Proficient in AI-assisted development tools such as GitHub Copilot, Cursor, or others for accelerating code generation and enhancing developer productivity
Code versioning and collaboration using Git — branching strategies, pull request workflows, and pipeline-as-code practices
Familiarity with REST APIs — consuming external data vendor APIs and building service-layer integrations
Familiarity with GCP cloud technologies — Cloud Storage, Pub/Sub, Datastream, Cloud Monitoring, IAM, and VPC Service Controls
Nice to have:
Basic knowledge of data manipulation and analysis libraries — pandas, PySpark, or equivalent
Basic knowledge of columnar storage, SQL-based querying, and time-series analytics (ClickHouse or equivalent)
Familiarity with Dataplex for data discovery, lineage, policy tagging, and data quality rule management
Understanding of Change Data Capture (CDC) patterns using Datastream for replicating transactional data into BigQuery
Understanding of bitemporal data modeling concepts (valid-time and system-time) and the challenges of implementing them within BigQuery's append-optimised design
Understanding of financial reference data — equities, fixed income identifiers, corporate actions, or index composition data
Familiarity with BigQuery cost management — slot reservations, query cost controls, and workload isolation using reservations and assignments
Exposure to CI/CD pipelines and infrastructure-as-code using Terraform for data platform deployments on GCP
Prior experience or projects involving LLMs and Agentic AI — particularly using Vertex AI for AI-assisted data quality, anomaly detection, semantic search, or natural language querying over structured datasets — is a strong plus