CrawlJobs Logo

Member of Technical Staff, Data Infrastructure

240000.00 - 290000.00 USD / Year · Job Posted January 20, 2026
Apply Position
Job Link Share

Job Description

We're looking for a Data Engineer to build and scale the data infrastructure that powers Runway's AI research and business intelligence. You'll own critical data pipelines spanning production databases, analytics warehouses, and large-scale ML training datasets. This role sits at the intersection of data engineering, ML infrastructure, and analytics—you'll enable both world-class research and data-driven business decisions.

Job Responsibility

  • Build and own pipelines for the creation, curation, and processing of large-scale multimodal datasets, including vector database (LanceDB) management and query optimization for ML metadata
  • Build and own ETL and CDC streams from Postgres and ClickHouse to analytics warehouses
  • Build standardized data transformation layers using dbt to replace ad-hoc SQL queries and create maintainable data models for business analytics
  • Manage production databases (Postgres, ClickHouse) and optimize for performance and reliability

Requirements

  • 4+ years of industry experience in data engineering
  • Strong knowledge of Python
  • Experience with data quality, deduplication, and cleaning at scale
  • Comfortable working with cloud storage (S3) and managing large datasets
  • Experience building and maintaining ETL/CDC pipelines at scale
  • Strong SQL skills and experience with multiple database systems (Postgres, columnar databases like ClickHouse/Redshift)
  • Humility and open mindedness

Nice to have

  • Experience with one or more frameworks for large-scale data processing (e.g. Spark, Ray, etc) and one or more ML frameworks (e.g. PyTorch, JAX)
  • Knowledge of cloud platforms (AWS, GCP, or Azure) and their data service offerings
  • Knowledge of data privacy and data security best practices
  • Experience with business intelligence and visualization tools (e.g., Looker, Tableau, PowerBI, Metabase, or similar)
  • Experience in a high-growth startup environment or similar fast-paced setting

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Member of Technical Staff, Data Infrastructure

8 matching positions

Member of Technical Staff, Infrastructure Data & Analytics

We are seeking experienced Infrastructure Data & Analytics Engineers to join our...
Location
Location
United States , Multiple Locations; Mountain View; San Francisco Bay area; New York City metropolitan area
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical field AND 8+ years technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 6+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 10+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Proven technical leadership in data engineering, analytics platforms, or large-scale telemetry systems
  • Hands-on experience with ETL orchestration frameworks such as Airflow, Dagster, or similar
  • Strong communication skills
  • can explain complex systems clearly to senior leader
Job Responsibility
Job Responsibility
  • Act as the technical lead and owner for infrastructure analytics across compute, storage, and networking
  • Design and build durable, scalable data pipelines that ingest telemetry from clusters, schedulers, health systems, and capacity trackers into Data Warehouse
  • Define and standardize core metrics and semantics (e.g., utilization, occupancy, MFU, goodput, capacity readiness, delivery-to-production)
  • Architect and maintain self-service dashboards and APIs for fleet, cluster, and squad-level visibility
  • Partner closely with stakeholders across Supercomputing Infra, Researchers, Strategy and Executives to ensure metrics reflect operational and business reality
  • Implement robust and fault-tolerant systems for data ingestion and processing
  • Lead data architecture and engineering decisions, applying strong technical judgment to proactively shape executive-level discussions and decisions
  • Identify data gaps and instrumentation issues
  • drive fixes by influencing upstream engineering teams
  • Establish data quality, validation, documentation, and governance so metrics are trusted and repeatable
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Infra - MAI Superintelligence Team

Help build the world’s most advanced multimodal dataset at Microsoft AI. We are ...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering work
  • OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ year(s) experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience
  • Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 8+ years experience in business analytics, data science, software development, data modeling or data engineering work
  • OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years of business analytics, data science, software development, data modeling or data engineering work experience
  • OR equivalent experience
Job Responsibility
Job Responsibility
  • Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video)
  • Own and maintain critical data infrastructures, including spark, ray, vector databases, and others
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Infrastructure Engineer

As Microsoft continues to push the boundaries of AI, we are on the lookout for p...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical discipline AND 4+ years technical engineering experience building services and products in languages such as Python, C#, C++, Rust, Java
  • OR equivalent experience
  • 4+ years’ experience building scalable platforms on public cloud infrastructure like Azure, AWS, or GCP with extensive use of technologies like Docker, Kubernetes, nginx, RDBMS, key-value stores, etc
  • 4+ years’ experience in building and releasing production software at the platform level
  • Solid knowledge of APIs, data flows, systems, and services
Job Responsibility
Job Responsibility
  • Design, develop, and maintain performant and secure AI Platform services that power Copilot
  • Work collaboratively with platform, infrastructure, application engineers, and AI researchers to build next generation AI products and services
  • Ship high-quality and maintainable code, and ensure the reliability, scalability, and performance of platform components
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Senior Member of technical staff (Infrastructure)

About the Team: The Infrastructure team aims to make it seamless for our researc...
Location
Location
United Kingdom; France , London; Paris
Salary
Salary:
Not provided
hcompany.ai Logo
H Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Infrastructure as code (CDK, Terraform, ...)
  • Experience architecting and deploying distributed systems on public cloud (AWS, Azure, GCP)
  • Observability and monitoring (Datadog, Prometheus, Grafana, …)
  • Good knowledge of a modern programming language (ideally Python or JS/Typescript)
Job Responsibility
Job Responsibility
  • Designing and managing the infrastructure to support Research efforts in Model and Agent development incl. training infrastructure, data pipelines and inference
  • Designing and managing the infrastructure to support Product Engineering efforts on H Company’s agent platform including client-facing APIs and agent runtimes within various deployment scenarios (multi-tenant and on-prem)
  • Setup and maintain observability and monitoring strategies
  • Mentor and grow other engineers in infrastructure-related topics as well as general engineering practices
What we offer
What we offer
  • Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
  • Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment
  • Enjoy a competitive salary
  • Unlock opportunities for professional growth, continuous learning, and career development
  • Fulltime
Read More
Arrow Right

Member of technical staff (Infrastructure)

About H: H exists to push the boundaries of superintelligence with agentic AI. B...
Location
Location
France; United Kingdom , Paris; London
Salary
Salary:
Not provided
hcompany.ai Logo
H Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Observability and monitoring (Datadog, Prometheus, Grafana, …)
  • Good knowledge of a modern programming language (ideally Python or JS/Typescript)
Job Responsibility
Job Responsibility
  • Designing and managing the infrastructure to support Research efforts in Model and Agent development incl. training infrastructure, data pipelines and inference
  • Product Engineering efforts on H Company’s agent platform including client-facing APIs and agent runtimes within various deployment scenarios (multi-tenant and on-prem)
  • Setup and maintain observability and monitoring strategies
What we offer
What we offer
  • Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
  • Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment
  • Enjoy a competitive salary
  • Unlock opportunities for professional growth, continuous learning, and career development
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Data Engineering

As a Data Engineer specializing in pretraining data, you will play a pivotal rol...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale web datasets like CommonCrawl
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Design and build scalable data pipelines to ingest, parse, filter, and optimize diverse web datasets
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Data Analysis and Evaluation

As a Member of Technical Staff in Data Analysis and Evaluation, you will play a ...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extremely strong software engineering skills
  • Strong expertise in designing and conducting data collection tasks, including working with human annotators
  • Strong statistical skills and experience evaluating scientific experiments related to data collection and model performance
  • Experience analysing datasets with respect to their quality, biases, and suitability for training ML models
  • Hands-on experience training large language models (LLMs) on distributed training infrastructures
  • Familiarity with evaluating and improving the generalisability and robustness of ML systems
  • Proficiency in programming languages such as Python and ML frameworks (e.g., PyTorch, TensorFlow, JAX)
  • Excellent communication skills to collaborate effectively with cross-functional teams and present findings
  • One or more papers at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP)
Job Responsibility
Job Responsibility
  • Design and oversee data collection tasks, including supporting human annotators and ensuring data quality
  • Develop and apply statistical methods to evaluate the quality and reliability of datasets
  • Analyse and assess the generalisability and robustness of ML systems across diverse use cases
  • Collaborate with teams to improve dataset quality and model performance
  • Train and fine-tune large language models (LLMs) on distributed training infrastructures
  • Conduct experiments to evaluate model performance and identify areas for improvement
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Platform

If you are excited by the challenge of designing distributed systems that proces...
Location
Location
United States , Mountain View; Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 3+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience
  • Proficiency in Python, Scala, Java, or Go
  • Deep Distributed Systems Knowledge: Demonstrated technical understanding of massive-scale compute engines (e.g., Apache Spark, Flink, Ray, Trino, or Snowflake)
  • Experience architecting Lakehouse environments at scale (using Delta Lake, Iceberg, or Hudi)
  • Experience building internal developer platforms or "Data-as-a-Service" APIs
  • Strong background in streaming technologies (Kafka, Azure EventHubs, Pulsar) and stateful stream processing
  • Experience with container orchestration (Kubernetes) for deploying data applications
  • Experience enabling AI/ML workloads (Feature Stores, Vector Databases)
Job Responsibility
Job Responsibility
  • Core Platform Engineering: Design and build the underlying frameworks (based on Spark/Databricks) that allow internal teams to process massive datasets efficiently
  • Distributed Systems Architecture: Modernize our data stack by moving from batch-heavy patterns to event-driven architectures
  • Unstructured AI Data Pipelines: Architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets
  • AI Feedback Loops: Engineer the high-throughput telemetry systems that capture user interactions with Copilot
  • Infrastructure as Code: Treat the data platform as software. Define and deploy all storage, compute, and networking resources using IaC (Bicep/Terraform)
  • Data Reliability Engineering: Move beyond simple "validation checks" to build automated governance and observability systems
  • Compute Optimization: Deep-dive into query execution plans and cluster performance. Optimize shuffle operations, partition strategies, and resource allocation
  • Fulltime
Read More
Arrow Right