CrawlJobs Logo

Member of Technical Staff, Machine Learning Datasets

runwayml.com Logo

Runway

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

270000.00 - 370000.00 USD / Year

Job Description:

We are building AI to simulate the world through merging art and science. We believe that world models are at the frontier of progress in artificial intelligence. Language models alone won’t solve the world’s hardest problems – robotics, disease, scientific discovery. Real progress requires models that experience the world and learn from their mistakes, the same way that humans do. And this kind of trial and error can be massively accelerated when done in simulation, rather than in the real world. World models offer the most clear path to general-purpose simulation, changing how stories are told, how scientific progress is made and how the next frontiers of humanity are reached.

Job Responsibility:

  • Develop and maintain large-scale, multimodal datasets for training and evaluating models
  • Optimize models for data preprocessing tasks
  • Create and run evaluations and benchmark analyses for datasets and models
  • Implement fast iteration cycles and feedback loops to continuously improve model datasets
  • Work with a world-class research team to push the boundaries of content creation
  • Evaluate new datasets and models for upstream data tasks that feed into our products

Requirements:

  • 4+ years of relevant experience in machine learning or dataset engineering, ideally with multimodal datasets
  • Experience with running and optimizing models offline at large scale
  • Excellent data modeling skills and experience with data curation
  • Proficiency in model finetuning and optimization for data preprocessing
  • Strong data analysis and SQL skills
  • Experience in creating evaluations and running benchmark analyses
  • Solid knowledge of at least one machine learning framework (e.g. PyTorch, JAX, TensorFlow)
  • Very strong programming skills and ability to write clean and maintainable code
  • Deep interest in building human-in-the-loop systems for creativity
  • Ability to rapidly prototype solutions and iterate on them with tight product deadlines
  • Strong familiarity with tools such as Ray, Kubernetes, Airflow, Prefect
  • Excellent communication, collaboration, and documentation skills

Additional Information:

Job Posted:
January 10, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Machine Learning Datasets

Staff Machine Learning Engineer

We are seeking a Staff Machine Learning Engineer to join our Foundation AI team....
Location
Location
United States , Boston
Salary
Salary:
170000.00 - 230000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, Electrical Engineering, or a related field, or equivalent professional experience
  • 7+ years of experience in applied ML, AI research, or large-scale modeling, with a track record of delivering production systems
  • Expertise in modern deep learning (e.g., transformers, state space models) and multimodal model training
  • Proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
  • Experience building and scaling large datasets and training large models in distributed compute environments
  • Strong applied experience with representation learning, self-supervised methods, and fine-tuning for downstream applications
  • Familiarity with MLOps best practices including model versioning, evaluation, CI/CD for ML, and cloud-based compute
  • Excellent communication skills and ability to collaborate cross-functionally with engineers, researchers, and product teams
  • Passion for WHOOP’s mission to improve human performance and extend healthspan through science and technology
Job Responsibility
Job Responsibility
  • Design, train, and optimize large-scale multimodal foundation models that integrate wearable sensor data, text, biomarkers, and behavioral data
  • Conduct applied research in self-supervised learning, representation learning, and downstream task fine tuning to advance WHOOP’s core model capabilities
  • Develop scalable, distributed training pipelines for large models on high-performance compute environments
  • Collaborate with MLOps, data engineering, and software engineering teams to operationalize models for production deployment, ensuring robustness, reproducibility, and observability
  • Partner with product and research teams to translate foundation model capabilities into downstream features that deliver meaningful member value
  • Contribute to the technical roadmap and architectural direction for foundation model development at WHOOP
  • Serve as a technical mentor for other data scientists, sharing best practices in deep learning, large-scale training, and multimodal data integration
  • Ensure models adhere to WHOOP’s standards for ethical, transparent, and privacy-preserving AI
What we offer
What we offer
  • competitive base salaries
  • meaningful equity
  • benefits
  • generous equity package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, AI Training Infrastructure

As a Training Infrastructure Engineer, you'll design, build, and optimize the in...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
  • 3+ years of experience with distributed systems and ML infrastructure
  • Experience with PyTorch
  • Proficiency in cloud platforms (AWS, GCP, Azure)
  • Experience with containerization, orchestration (Kubernetes, Docker)
  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for large-scale model training workloads
  • Develop and maintain distributed training pipelines for LLMs and multimodal models
  • Optimize training performance across multiple GPUs, nodes, and data centers
  • Implement monitoring, logging, and debugging tools for training operations
  • Architect and maintain data storage solutions for large-scale training datasets
  • Automate infrastructure provisioning, scaling, and orchestration for model training
  • Collaborate with researchers to implement and optimize training methodologies
  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
  • Troubleshoot complex performance issues in distributed training environments
What we offer
What we offer
  • meaningful equity in a fast-growing startup
  • comprehensive benefits package
  • Fulltime
Read More
Arrow Right
New

Data Scientist

The Data Scientist plays a pivotal role in planning, executing, and delivering m...
Location
Location
United States , Camden
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s, or PhD in Computer Science, Data Science, Engineering, Statistics, Applied Mathematics, Operations Research, or a related quantitative field
  • Specialization in ML, AI, cognitive science, or data science is highly preferred
  • 3-5 years of hands-on experience planning and executing end-to-end data science projects with demonstrated impact on clinical or operational outcomes in business environments
  • Advanced programming proficiency in Python or R with strong expertise in machine learning frameworks (scikit-learn, TensorFlow, PyTorch) and statistical analysis tools
  • Expertise in machine learning and statistical techniques including supervised/unsupervised learning, deep learning, NLP, computer vision, regression models, ensemble methods, and experimental design (A/B testing)
  • Strong data engineering capabilities including SQL/NoSQL database programming, distributed computing tools (Hadoop, Spark, Kafka), data pipeline development, and experience with cloud platforms (AWS, Azure, GCP)
  • Production ML and MLOps experience including model deployment, monitoring, containerization (Docker, Kubernetes), version control, and applying DevOps principles to data science workflows
  • Data visualization and communication excellence with ability to create compelling dashboards (Tableau, Power BI), translate complex technical findings into actionable insights, and present to diverse audiences from executives to frontline staff
  • Cross-functional collaboration skills with proven ability to work in agile environments, partner with stakeholders to align technical solutions with business objectives, and mentor junior team members
  • Healthcare domain knowledge preferred, particularly experience with Epic EHR systems, clinical workflows, and healthcare data standards, along with relevant certifications (Clarity /Caboodle, Google Cloud ML Engineer, AWS ML Specialist)
Job Responsibility
Job Responsibility
  • Collect, clean, and analyze datasets from diverse internal and external sources, applying advanced data wrangling techniques
  • Acquire access to various databases and source systems (SQL, NoSQL, graph databases) and create data pipelines
  • Apply statistical analysis and visualization techniques to explore and prepare data
  • Design, develop, and validate machine learning, statistical, and optimization models
  • Select appropriate algorithms and models for AI/ML and test them for accuracy, robustness, and fairness
  • Perform feature selection and engineering
  • Integrate domain knowledge into ML solutions
  • Conduct controlled experiments (A/B and multivariate testing)
  • Collaborate with MLOps, data engineers, and IT to evaluate deployment options
  • Continuously monitor execution and health of production ML models
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - ML Research Engineer, Data

Our Data team powers Liquid Foundation Models across pre-training, vision, audio...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python skills with the ability to quickly comprehend problems and translate them into clean, working code
  • Solid ML fundamentals: experience training, evaluating, and iterating on models (PyTorch preferred)
  • Track record of learning new technical domains quickly
  • 3+ years relevant experience with an M.S., or 1+ year with a Ph.D. (5+ years with a B.S.)
Job Responsibility
Job Responsibility
  • Build and maintain data processing, filtering, and selection pipelines at scale
  • Create pipelines for pretraining, midtraining, SFT, and preference optimization datasets
  • Design synthetic data generation systems using LLMs, structured prompting, and domain-specific generators
  • Design and run evaluations and ablations to measure dataset's impact on model performance
  • Monitor public datasets across text, vision, and audio domains
  • Collaborate with pre-training, vision, and audio teams on modality-specific data needs
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, MLE

This is not a typical “Applied Scientist” or “ML Engineer” role. As a Member of ...
Location
Location
Singapore
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong ML fundamentals and the ability to frame complex, ambiguous problems as ML solutions
  • Fluency with Python and core ML/LLM frameworks
  • Experience working with (or the ability to learn) large-scale datasets and distributed training or inference pipelines
  • Understanding of LLM architectures, tuning techniques (CPT, post-training), and evaluation methodologies
  • Demonstrated ability to meaningfully shape LLM performance
  • A broad view of the ML research landscape and a desire to push the state of the art
  • Bias toward action, high ownership, and comfort with ambiguity
  • Humility and strong collaboration instincts
  • A deep conviction that AI should meaningfully empower people and organizations
Job Responsibility
Job Responsibility
  • Contribute to the design and delivery of custom LLM solutions for enterprise customers
  • Translate ambiguous business problems into well-framed ML problems with clear success criteria and evaluation methodologies
  • Build custom models using Cohere’s foundation model stack, CPT recipes, post-training pipelines (including RLVR), and data assets
  • Develop SOTA modeling techniques that directly enhance model performance for customer use-cases
  • Contribute improvements back to the foundation-model stack — including new capabilities, tuning strategies, and evaluation frameworks
  • Work as part of Cohere’s customer facing MLE team to identify high-value opportunities where LLMs can unlock transformative impact to our enterprise customers
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, MLE

Our mission is to scale intelligence to serve humanity. We’re training and deplo...
Location
Location
United States; Canada , San Francisco; New York; Toronto; Montreal
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong ML fundamentals and the ability to frame complex, ambiguous problems as ML solutions
  • Fluency with Python and core ML/LLM frameworks
  • Experience working with (or the ability to learn) large-scale datasets and distributed training or inference pipelines
  • Understanding of LLM architectures, tuning techniques (CPT, post-training), and evaluation methodologies
  • Demonstrated ability to meaningfully shape LLM performance
  • A broad view of the ML research landscape and a desire to push the state of the art
  • Bias toward action, high ownership, and comfort with ambiguity
  • Humility and strong collaboration instincts
  • A deep conviction that AI should meaningfully empower people and organizations
Job Responsibility
Job Responsibility
  • Contribute to the design and delivery of custom LLM solutions for enterprise customers
  • Translate ambiguous business problems into well-framed ML problems with clear success criteria and evaluation methodologies
  • Build custom models using Cohere’s foundation model stack, CPT recipes, post-training pipelines (including RLVR), and data assets
  • Develop SOTA modeling techniques that directly enhance model performance for customer use-cases
  • Contribute improvements back to the foundation-model stack — including new capabilities, tuning strategies, and evaluation frameworks
  • Work as part of Cohere’s customer facing MLE team to identify high-value opportunities where LLMs can unlock transformative impact to our enterprise customers
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Pre-Training Data

As a Machine Learning Engineer specializing in pretraining data, you will play a...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with curriculum learning, data mixing and data attribution
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • Knowledge of data quality assessment techniques and experimentation with data mixtures
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence (MAIST) Post Training team is dedicated to ad...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate OR equivalent experience
  • Significant experience in large-scale model training, data curation, and hands-on coding, ideally from leading research labs
  • Deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Ability to develop LLMs, SLMs, multimodal, and coding models using both proprietary and open-source frameworks
  • Self-driven, able to write efficient code and debug training jobs, document findings, and demonstrate a track record in these fields
  • Curious, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right