CrawlJobs Logo

Member of Technical Staff, Machine Learning Datasets

runwayml.com Logo

Runway

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

270000.00 - 370000.00 USD / Year

Job Description:

We are building AI to simulate the world through merging art and science. We believe that world models are at the frontier of progress in artificial intelligence. Language models alone won’t solve the world’s hardest problems – robotics, disease, scientific discovery. Real progress requires models that experience the world and learn from their mistakes, the same way that humans do. And this kind of trial and error can be massively accelerated when done in simulation, rather than in the real world. World models offer the most clear path to general-purpose simulation, changing how stories are told, how scientific progress is made and how the next frontiers of humanity are reached.

Job Responsibility:

  • Develop and maintain large-scale, multimodal datasets for training and evaluating models
  • Optimize models for data preprocessing tasks
  • Create and run evaluations and benchmark analyses for datasets and models
  • Implement fast iteration cycles and feedback loops to continuously improve model datasets
  • Work with a world-class research team to push the boundaries of content creation
  • Evaluate new datasets and models for upstream data tasks that feed into our products

Requirements:

  • 4+ years of relevant experience in machine learning or dataset engineering, ideally with multimodal datasets
  • Experience with running and optimizing models offline at large scale
  • Excellent data modeling skills and experience with data curation
  • Proficiency in model finetuning and optimization for data preprocessing
  • Strong data analysis and SQL skills
  • Experience in creating evaluations and running benchmark analyses
  • Solid knowledge of at least one machine learning framework (e.g. PyTorch, JAX, TensorFlow)
  • Very strong programming skills and ability to write clean and maintainable code
  • Deep interest in building human-in-the-loop systems for creativity
  • Ability to rapidly prototype solutions and iterate on them with tight product deadlines
  • Strong familiarity with tools such as Ray, Kubernetes, Airflow, Prefect
  • Excellent communication, collaboration, and documentation skills

Additional Information:

Job Posted:
January 10, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Machine Learning Datasets

Staff Machine Learning Engineer

We are seeking a Staff Machine Learning Engineer to join our Foundation AI team....
Location
Location
United States , Boston
Salary
Salary:
170000.00 - 230000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, Electrical Engineering, or a related field, or equivalent professional experience
  • 7+ years of experience in applied ML, AI research, or large-scale modeling, with a track record of delivering production systems
  • Expertise in modern deep learning (e.g., transformers, state space models) and multimodal model training
  • Proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
  • Experience building and scaling large datasets and training large models in distributed compute environments
  • Strong applied experience with representation learning, self-supervised methods, and fine-tuning for downstream applications
  • Familiarity with MLOps best practices including model versioning, evaluation, CI/CD for ML, and cloud-based compute
  • Excellent communication skills and ability to collaborate cross-functionally with engineers, researchers, and product teams
  • Passion for WHOOP’s mission to improve human performance and extend healthspan through science and technology
Job Responsibility
Job Responsibility
  • Design, train, and optimize large-scale multimodal foundation models that integrate wearable sensor data, text, biomarkers, and behavioral data
  • Conduct applied research in self-supervised learning, representation learning, and downstream task fine tuning to advance WHOOP’s core model capabilities
  • Develop scalable, distributed training pipelines for large models on high-performance compute environments
  • Collaborate with MLOps, data engineering, and software engineering teams to operationalize models for production deployment, ensuring robustness, reproducibility, and observability
  • Partner with product and research teams to translate foundation model capabilities into downstream features that deliver meaningful member value
  • Contribute to the technical roadmap and architectural direction for foundation model development at WHOOP
  • Serve as a technical mentor for other data scientists, sharing best practices in deep learning, large-scale training, and multimodal data integration
  • Ensure models adhere to WHOOP’s standards for ethical, transparent, and privacy-preserving AI
What we offer
What we offer
  • competitive base salaries
  • meaningful equity
  • benefits
  • generous equity package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, AI Training Infrastructure

As a Training Infrastructure Engineer, you'll design, build, and optimize the in...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
  • 3+ years of experience with distributed systems and ML infrastructure
  • Experience with PyTorch
  • Proficiency in cloud platforms (AWS, GCP, Azure)
  • Experience with containerization, orchestration (Kubernetes, Docker)
  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for large-scale model training workloads
  • Develop and maintain distributed training pipelines for LLMs and multimodal models
  • Optimize training performance across multiple GPUs, nodes, and data centers
  • Implement monitoring, logging, and debugging tools for training operations
  • Architect and maintain data storage solutions for large-scale training datasets
  • Automate infrastructure provisioning, scaling, and orchestration for model training
  • Collaborate with researchers to implement and optimize training methodologies
  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
  • Troubleshoot complex performance issues in distributed training environments
What we offer
What we offer
  • meaningful equity in a fast-growing startup
  • comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - ML Research Engineer, Data

Our Data team powers Liquid Foundation Models across pre-training, vision, audio...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python skills with the ability to quickly comprehend problems and translate them into clean, working code
  • Solid ML fundamentals: experience training, evaluating, and iterating on models (PyTorch preferred)
  • Track record of learning new technical domains quickly
  • 3+ years relevant experience with an M.S., or 1+ year with a Ph.D. (5+ years with a B.S.)
Job Responsibility
Job Responsibility
  • Build and maintain data processing, filtering, and selection pipelines at scale
  • Create pipelines for pretraining, midtraining, SFT, and preference optimization datasets
  • Design synthetic data generation systems using LLMs, structured prompting, and domain-specific generators
  • Design and run evaluations and ablations to measure dataset's impact on model performance
  • Monitor public datasets across text, vision, and audio domains
  • Collaborate with pre-training, vision, and audio teams on modality-specific data needs
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, MLE

This is not a typical “Applied Scientist” or “ML Engineer” role. As a Member of ...
Location
Location
Singapore
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong ML fundamentals and the ability to frame complex, ambiguous problems as ML solutions
  • Fluency with Python and core ML/LLM frameworks
  • Experience working with (or the ability to learn) large-scale datasets and distributed training or inference pipelines
  • Understanding of LLM architectures, tuning techniques (CPT, post-training), and evaluation methodologies
  • Demonstrated ability to meaningfully shape LLM performance
  • A broad view of the ML research landscape and a desire to push the state of the art
  • Bias toward action, high ownership, and comfort with ambiguity
  • Humility and strong collaboration instincts
  • A deep conviction that AI should meaningfully empower people and organizations
Job Responsibility
Job Responsibility
  • Contribute to the design and delivery of custom LLM solutions for enterprise customers
  • Translate ambiguous business problems into well-framed ML problems with clear success criteria and evaluation methodologies
  • Build custom models using Cohere’s foundation model stack, CPT recipes, post-training pipelines (including RLVR), and data assets
  • Develop SOTA modeling techniques that directly enhance model performance for customer use-cases
  • Contribute improvements back to the foundation-model stack — including new capabilities, tuning strategies, and evaluation frameworks
  • Work as part of Cohere’s customer facing MLE team to identify high-value opportunities where LLMs can unlock transformative impact to our enterprise customers
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, MLE

Our mission is to scale intelligence to serve humanity. We’re training and deplo...
Location
Location
United States; Canada , San Francisco; New York; Toronto; Montreal
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong ML fundamentals and the ability to frame complex, ambiguous problems as ML solutions
  • Fluency with Python and core ML/LLM frameworks
  • Experience working with (or the ability to learn) large-scale datasets and distributed training or inference pipelines
  • Understanding of LLM architectures, tuning techniques (CPT, post-training), and evaluation methodologies
  • Demonstrated ability to meaningfully shape LLM performance
  • A broad view of the ML research landscape and a desire to push the state of the art
  • Bias toward action, high ownership, and comfort with ambiguity
  • Humility and strong collaboration instincts
  • A deep conviction that AI should meaningfully empower people and organizations
Job Responsibility
Job Responsibility
  • Contribute to the design and delivery of custom LLM solutions for enterprise customers
  • Translate ambiguous business problems into well-framed ML problems with clear success criteria and evaluation methodologies
  • Build custom models using Cohere’s foundation model stack, CPT recipes, post-training pipelines (including RLVR), and data assets
  • Develop SOTA modeling techniques that directly enhance model performance for customer use-cases
  • Contribute improvements back to the foundation-model stack — including new capabilities, tuning strategies, and evaluation frameworks
  • Work as part of Cohere’s customer facing MLE team to identify high-value opportunities where LLMs can unlock transformative impact to our enterprise customers
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Pre-Training Data

As a Machine Learning Engineer specializing in pretraining data, you will play a...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with curriculum learning, data mixing and data attribution
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • Knowledge of data quality assessment techniques and experimentation with data mixtures
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Data Research Engineer

We are seeking Data Research Engineers to join our Multimodal team, where we are...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or a related technical field AND technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.)
  • OR equivalent experience
  • Experience in data analysis or data engineering
  • Proficiency in statistics and exploratory data analysis methods
  • Ability to communicate technical findings effectively to research and product teams
Job Responsibility
Job Responsibility
  • Create high-quality datasets for training and evaluation
  • run experiments on new datasets (data ablations) to assess their impact and determine the most effective data
  • Develop and maintain scalable data pipelines for multimodal ingestion, pre-processing, filtering, and annotation
  • Analyse real-world multimodal datasets to assess quality, diversity, relevance, and identify areas for improvement
  • Build lightweight tools and workflows for dataset auditing, visualization, and versioning
  • Collaborate with Safety, Ethics, and Governance teams to ensure datasets meet standards for quality, privacy, and responsible AI practices
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Forward Deployed Engineer

You will work directly on customer engagements that generate revenue. This is ha...
Location
Location
United States , San Francisco, Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on fine-tuning experience with modern LLMs (last 12-18 months): LoRA, PEFT, DPO, instruction tuning, or similar
  • Strong ML fundamentals: you understand how models learn, fail, and improve
  • Experience generating or curating training data to address model gaps
  • Autonomous coding and debugging skills in Python and PyTorch
  • Proficiency with open-source ML ecosystem (Hugging Face transformers, datasets, accelerate)
  • Fine-tunes models: You have hands-on experience with techniques like LoRA, PEFT, DPO, instruction tuning, or RLHF. You've written training loops, not just API calls
  • Works with modern architectures: Your experience includes models released in the last 12-18 months (Llama 3.x, Mistral, Gemma, Qwen, etc.), not just BERT or classical ML
  • Generates and curates data: You've created synthetic training data to address specific model failure modes. You understand how data quality drives model performance
  • Debugs methodically: When a model underperforms, you diagnose whether it's a data problem, architecture problem, or training problem, and you fix it
  • Ships to customers: You can translate ambiguous customer requirements into concrete technical specs and deliver against quality metrics
Job Responsibility
Job Responsibility
  • Fine-tune LFMs on customer data to hit quality and latency targets for on-device and edge deployments
  • Generate and curate training data to address specific model failure modes
  • Run experiments, track metrics, and iterate until customer success criteria are met
  • Translate ambiguous customer requirements into concrete technical specifications
  • Provide analytics to commercial teams for contract structuring and pricing
  • Work across text, vision, and audio modalities as customer needs require
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right