CrawlJobs Logo

Member of Technical Staff, Machine Learning Datasets

runwayml.com Logo

Runway

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

270000.00 - 370000.00 USD / Year

Job Description:

We are building AI to simulate the world through merging art and science. We believe that world models are at the frontier of progress in artificial intelligence. Language models alone won’t solve the world’s hardest problems – robotics, disease, scientific discovery. Real progress requires models that experience the world and learn from their mistakes, the same way that humans do. And this kind of trial and error can be massively accelerated when done in simulation, rather than in the real world. World models offer the most clear path to general-purpose simulation, changing how stories are told, how scientific progress is made and how the next frontiers of humanity are reached.

Job Responsibility:

  • Develop and maintain large-scale, multimodal datasets for training and evaluating models
  • Optimize models for data preprocessing tasks
  • Create and run evaluations and benchmark analyses for datasets and models
  • Implement fast iteration cycles and feedback loops to continuously improve model datasets
  • Work with a world-class research team to push the boundaries of content creation
  • Evaluate new datasets and models for upstream data tasks that feed into our products

Requirements:

  • 4+ years of relevant experience in machine learning or dataset engineering, ideally with multimodal datasets
  • Experience with running and optimizing models offline at large scale
  • Excellent data modeling skills and experience with data curation
  • Proficiency in model finetuning and optimization for data preprocessing
  • Strong data analysis and SQL skills
  • Experience in creating evaluations and running benchmark analyses
  • Solid knowledge of at least one machine learning framework (e.g. PyTorch, JAX, TensorFlow)
  • Very strong programming skills and ability to write clean and maintainable code
  • Deep interest in building human-in-the-loop systems for creativity
  • Ability to rapidly prototype solutions and iterate on them with tight product deadlines
  • Strong familiarity with tools such as Ray, Kubernetes, Airflow, Prefect
  • Excellent communication, collaboration, and documentation skills

Additional Information:

Job Posted:
January 10, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Machine Learning Datasets

Staff Machine Learning Engineer

We are seeking a Staff Machine Learning Engineer to join our Foundation AI team....
Location
Location
United States , Boston
Salary
Salary:
170000.00 - 230000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree (Master’s or Ph.D.) in Computer Science, Machine Learning, Electrical Engineering, or a related field, or equivalent professional experience
  • 7+ years of experience in applied ML, AI research, or large-scale modeling, with a track record of delivering production systems
  • Expertise in modern deep learning (e.g., transformers, state space models) and multimodal model training
  • Proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
  • Experience building and scaling large datasets and training large models in distributed compute environments
  • Strong applied experience with representation learning, self-supervised methods, and fine-tuning for downstream applications
  • Familiarity with MLOps best practices including model versioning, evaluation, CI/CD for ML, and cloud-based compute
  • Excellent communication skills and ability to collaborate cross-functionally with engineers, researchers, and product teams
  • Passion for WHOOP’s mission to improve human performance and extend healthspan through science and technology
Job Responsibility
Job Responsibility
  • Design, train, and optimize large-scale multimodal foundation models that integrate wearable sensor data, text, biomarkers, and behavioral data
  • Conduct applied research in self-supervised learning, representation learning, and downstream task fine tuning to advance WHOOP’s core model capabilities
  • Develop scalable, distributed training pipelines for large models on high-performance compute environments
  • Collaborate with MLOps, data engineering, and software engineering teams to operationalize models for production deployment, ensuring robustness, reproducibility, and observability
  • Partner with product and research teams to translate foundation model capabilities into downstream features that deliver meaningful member value
  • Contribute to the technical roadmap and architectural direction for foundation model development at WHOOP
  • Serve as a technical mentor for other data scientists, sharing best practices in deep learning, large-scale training, and multimodal data integration
  • Ensure models adhere to WHOOP’s standards for ethical, transparent, and privacy-preserving AI
What we offer
What we offer
  • competitive base salaries
  • meaningful equity
  • benefits
  • generous equity package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, AI Training Infrastructure

As a Training Infrastructure Engineer, you'll design, build, and optimize the in...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
  • 3+ years of experience with distributed systems and ML infrastructure
  • Experience with PyTorch
  • Proficiency in cloud platforms (AWS, GCP, Azure)
  • Experience with containerization, orchestration (Kubernetes, Docker)
  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for large-scale model training workloads
  • Develop and maintain distributed training pipelines for LLMs and multimodal models
  • Optimize training performance across multiple GPUs, nodes, and data centers
  • Implement monitoring, logging, and debugging tools for training operations
  • Architect and maintain data storage solutions for large-scale training datasets
  • Automate infrastructure provisioning, scaling, and orchestration for model training
  • Collaborate with researchers to implement and optimize training methodologies
  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
  • Troubleshoot complex performance issues in distributed training environments
What we offer
What we offer
  • meaningful equity in a fast-growing startup
  • comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Data Scientist

The Data Scientist plays a pivotal role in planning, executing, and delivering m...
Location
Location
United States , Camden
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s, or PhD in Computer Science, Data Science, Engineering, Statistics, Applied Mathematics, Operations Research, or a related quantitative field
  • Specialization in ML, AI, cognitive science, or data science is highly preferred
  • 3-5 years of hands-on experience planning and executing end-to-end data science projects with demonstrated impact on clinical or operational outcomes in business environments
  • Advanced programming proficiency in Python or R with strong expertise in machine learning frameworks (scikit-learn, TensorFlow, PyTorch) and statistical analysis tools
  • Expertise in machine learning and statistical techniques including supervised/unsupervised learning, deep learning, NLP, computer vision, regression models, ensemble methods, and experimental design (A/B testing)
  • Strong data engineering capabilities including SQL/NoSQL database programming, distributed computing tools (Hadoop, Spark, Kafka), data pipeline development, and experience with cloud platforms (AWS, Azure, GCP)
  • Production ML and MLOps experience including model deployment, monitoring, containerization (Docker, Kubernetes), version control, and applying DevOps principles to data science workflows
  • Data visualization and communication excellence with ability to create compelling dashboards (Tableau, Power BI), translate complex technical findings into actionable insights, and present to diverse audiences from executives to frontline staff
  • Cross-functional collaboration skills with proven ability to work in agile environments, partner with stakeholders to align technical solutions with business objectives, and mentor junior team members
  • Healthcare domain knowledge preferred, particularly experience with Epic EHR systems, clinical workflows, and healthcare data standards, along with relevant certifications (Clarity /Caboodle, Google Cloud ML Engineer, AWS ML Specialist)
Job Responsibility
Job Responsibility
  • Collect, clean, and analyze datasets from diverse internal and external sources, applying advanced data wrangling techniques
  • Acquire access to various databases and source systems (SQL, NoSQL, graph databases) and create data pipelines
  • Apply statistical analysis and visualization techniques to explore and prepare data
  • Design, develop, and validate machine learning, statistical, and optimization models
  • Select appropriate algorithms and models for AI/ML and test them for accuracy, robustness, and fairness
  • Perform feature selection and engineering
  • Integrate domain knowledge into ML solutions
  • Conduct controlled experiments (A/B and multivariate testing)
  • Collaborate with MLOps, data engineers, and IT to evaluate deployment options
  • Continuously monitor execution and health of production ML models
  • Fulltime
Read More
Arrow Right
New

Staff Machine Learning Engineer, Credit Products (Square Financial Services)

Block is one company built from many blocks, all united by the same purpose of e...
Location
Location
United States , Bay Area
Salary
Salary:
Not provided
block.xyz Logo
Block
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 8 years of related experience with a Bachelor's degree
  • or 6 years and a Master's degree
  • or a PhD with 3 years experience, with a focus on developing and deploying machine learning and statistical models in production environments
  • A degree in a technical field (e.g., Computer Science, Mathematics, Statistics, Physics, or Engineering)
  • Strong quantitative intuition and data visualization skills, with a proven ability to conduct sophisticated ad-hoc and exploratory analysis
  • Full-stack proficiency preferred, including the ability to contribute across the entire technical stack—from data pipelines to production-grade software architecture
  • The versatility to communicate clearly with both technical and non-technical audiences, particularly in the context of high-visibility projects and executive stakeholders
  • A pragmatic approach to problem-solving, with a willingness to utilize whichever tool is most appropriate for the situation while balancing complex business, technical, and regulatory constraints
  • Experience with tree-based models and gradient boosting is helpful but not required
  • we value the ability to adapt and learn new methodologies as the credit landscape evolves
Job Responsibility
Job Responsibility
  • Apply a rigorous scientific mindset to the challenge of underwriting new customer segments, involving the evaluation of alternative external data sources and the deployment of advanced architectures to enhance predictive accuracy
  • Lead complex ML Operations and Infrastructure initiatives that advance our modeling capabilities, such as scaling data ingestion or enabling the use of more complex neural networks
  • Design and implement the full credit modeling stack, taking responsibility for the entire lifecycle of credit decisioning and ensuring models are robustly integrated into production environments
  • Use data science techniques to leverage new data sources for modeling, making sense of messy datasets and bringing clarity to business decisions
  • Identify and execute material improvements to credit policy, applying an analytical lens to determine where technical or logic shifts can yield significant positive outcomes for the customer and the bank's portfolio
  • Support team members in ad-hoc and scheduled updates to existing models, and help troubleshoot issues in a real-time production environment
  • Operate effectively within the framework of a regulated bank (SFS), balancing rapid innovation with the requirements of safety, soundness, and compliance
What we offer
What we offer
  • Remote work
  • Medical insurance
  • Flexible time off
  • Retirement savings plans
  • Modern family planning
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - ML Research Engineer, Data

Our Data team powers Liquid Foundation Models across pre-training, vision, audio...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python skills with the ability to quickly comprehend problems and translate them into clean, working code
  • Solid ML fundamentals: experience training, evaluating, and iterating on models (PyTorch preferred)
  • Track record of learning new technical domains quickly
  • 3+ years relevant experience with an M.S., or 1+ year with a Ph.D. (5+ years with a B.S.)
Job Responsibility
Job Responsibility
  • Build and maintain data processing, filtering, and selection pipelines at scale
  • Create pipelines for pretraining, midtraining, SFT, and preference optimization datasets
  • Design synthetic data generation systems using LLMs, structured prompting, and domain-specific generators
  • Design and run evaluations and ablations to measure dataset's impact on model performance
  • Monitor public datasets across text, vision, and audio domains
  • Collaborate with pre-training, vision, and audio teams on modality-specific data needs
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, MLE

This is not a typical “Applied Scientist” or “ML Engineer” role. As a Member of ...
Location
Location
Singapore
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong ML fundamentals and the ability to frame complex, ambiguous problems as ML solutions
  • Fluency with Python and core ML/LLM frameworks
  • Experience working with (or the ability to learn) large-scale datasets and distributed training or inference pipelines
  • Understanding of LLM architectures, tuning techniques (CPT, post-training), and evaluation methodologies
  • Demonstrated ability to meaningfully shape LLM performance
  • A broad view of the ML research landscape and a desire to push the state of the art
  • Bias toward action, high ownership, and comfort with ambiguity
  • Humility and strong collaboration instincts
  • A deep conviction that AI should meaningfully empower people and organizations
Job Responsibility
Job Responsibility
  • Contribute to the design and delivery of custom LLM solutions for enterprise customers
  • Translate ambiguous business problems into well-framed ML problems with clear success criteria and evaluation methodologies
  • Build custom models using Cohere’s foundation model stack, CPT recipes, post-training pipelines (including RLVR), and data assets
  • Develop SOTA modeling techniques that directly enhance model performance for customer use-cases
  • Contribute improvements back to the foundation-model stack — including new capabilities, tuning strategies, and evaluation frameworks
  • Work as part of Cohere’s customer facing MLE team to identify high-value opportunities where LLMs can unlock transformative impact to our enterprise customers
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, MLE

Our mission is to scale intelligence to serve humanity. We’re training and deplo...
Location
Location
United States; Canada , San Francisco; New York; Toronto; Montreal
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong ML fundamentals and the ability to frame complex, ambiguous problems as ML solutions
  • Fluency with Python and core ML/LLM frameworks
  • Experience working with (or the ability to learn) large-scale datasets and distributed training or inference pipelines
  • Understanding of LLM architectures, tuning techniques (CPT, post-training), and evaluation methodologies
  • Demonstrated ability to meaningfully shape LLM performance
  • A broad view of the ML research landscape and a desire to push the state of the art
  • Bias toward action, high ownership, and comfort with ambiguity
  • Humility and strong collaboration instincts
  • A deep conviction that AI should meaningfully empower people and organizations
Job Responsibility
Job Responsibility
  • Contribute to the design and delivery of custom LLM solutions for enterprise customers
  • Translate ambiguous business problems into well-framed ML problems with clear success criteria and evaluation methodologies
  • Build custom models using Cohere’s foundation model stack, CPT recipes, post-training pipelines (including RLVR), and data assets
  • Develop SOTA modeling techniques that directly enhance model performance for customer use-cases
  • Contribute improvements back to the foundation-model stack — including new capabilities, tuning strategies, and evaluation frameworks
  • Work as part of Cohere’s customer facing MLE team to identify high-value opportunities where LLMs can unlock transformative impact to our enterprise customers
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Pre-Training Data

As a Machine Learning Engineer specializing in pretraining data, you will play a...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with curriculum learning, data mixing and data attribution
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • Knowledge of data quality assessment techniques and experimentation with data mixtures
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
Job Responsibility
Job Responsibility
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right