CrawlJobs Logo

Member of Technical Staff, Pre-Training Data

cohere.com Logo

Cohere

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As a Machine Learning Engineer specializing in pretraining data, you will play a pivotal role in developing the data pipeline that underpins Cohere’s advanced language models. In this role, you will conduct data ablations to evaluate data quality and construct pre-training data mixtures to enhance model performance. By combining research and engineering, you will bridge the gap between raw data and cutting-edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization. Your work will be essential to Cohere’s mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.

Job Responsibility:

  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models

Requirements:

  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with curriculum learning, data mixing and data attribution
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora
  • Knowledge of data quality assessment techniques and experimentation with data mixtures
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training

Nice to have:

Bonus: paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP)

What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Pre-Training Data

Member of Technical Staff, AI Multimodal

At Microsoft AI, we are on a mission to train the world’s most capable AI fronti...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modelling or data engineering work
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience
  • Expertise in multimodal Research with a strong publishing track record
  • Proven expertise in areas of interest, evidenced by an exceptional publication track record and/or significant technical leadership in high-impact projects
  • Strong analytical skills, attention to detail, and a commitment to data-driven decision-making
  • Experience and/or in-depth understandings about large-scale distributed systems
  • Ability to work collaboratively in a fast-paced, innovative environment
Job Responsibility
Job Responsibility
  • Develop algorithms, design model architectures, conduct experiments, champion measurement and evaluation, innovate datasets and data pipelines
  • Improve training and deployment efficiency, paying careful attention to detail, persevering, and learning from everyone’s attempts whether successful or not
  • Follow a rigorous data-driven approach grounded in meticulous ablation studies and scientific analysis
  • Innovate and iterate over ideas, prototypes, and product
  • Collaborate closely with teams on infrastructure, data engineering, pre-training, post-training, and product feedback
  • Advance the AI frontier responsibly
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, AI Multimodal - MAI Superintelligence Team

At Microsoft AI, we are on a mission to train the world’s most capable AI fronti...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modelling or data engineering work
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience
  • Expertise in multimodal Research with a strong publishing track record
  • Proven expertise in areas of interest, evidenced by an exceptional publication track record and/or significant technical leadership in high-impact projects
  • Strong analytical skills, attention to detail, and a commitment to data-driven decision-making
  • Experience and/or in-depth understandings about large-scale distributed systems
  • Ability to work collaboratively in a fast-paced, innovative environment
Job Responsibility
Job Responsibility
  • Develop algorithms, design model architectures, conduct experiments, champion measurement and evaluation, innovate datasets and data pipelines
  • Improve training and deployment efficiency, paying careful attention to detail, persevering, and learning from everyone’s attempts whether successful or not
  • Follow a rigorous data-driven approach grounded in meticulous ablation studies and scientific analysis
  • Innovate and iterate over ideas, prototypes, and product
  • Collaborate closely with teams on infrastructure, data engineering, pre-training, post-training, and product feedback
  • Advance the AI frontier responsibly
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Platform

If you are excited by the challenge of designing distributed systems that proces...
Location
Location
United States , Mountain View; Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 3+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience
  • Proficiency in Python, Scala, Java, or Go
  • Deep Distributed Systems Knowledge: Demonstrated technical understanding of massive-scale compute engines (e.g., Apache Spark, Flink, Ray, Trino, or Snowflake)
  • Experience architecting Lakehouse environments at scale (using Delta Lake, Iceberg, or Hudi)
  • Experience building internal developer platforms or "Data-as-a-Service" APIs
  • Strong background in streaming technologies (Kafka, Azure EventHubs, Pulsar) and stateful stream processing
  • Experience with container orchestration (Kubernetes) for deploying data applications
  • Experience enabling AI/ML workloads (Feature Stores, Vector Databases)
Job Responsibility
Job Responsibility
  • Core Platform Engineering: Design and build the underlying frameworks (based on Spark/Databricks) that allow internal teams to process massive datasets efficiently
  • Distributed Systems Architecture: Modernize our data stack by moving from batch-heavy patterns to event-driven architectures
  • Unstructured AI Data Pipelines: Architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets
  • AI Feedback Loops: Engineer the high-throughput telemetry systems that capture user interactions with Copilot
  • Infrastructure as Code: Treat the data platform as software. Define and deploy all storage, compute, and networking resources using IaC (Bicep/Terraform)
  • Data Reliability Engineering: Move beyond simple "validation checks" to build automated governance and observability systems
  • Compute Optimization: Deep-dive into query execution plans and cluster performance. Optimize shuffle operations, partition strategies, and resource allocation
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - ML Engineer / Scientist (JP Localization)

At Liquid, we’re not just building AI models—we’re redefining the architecture o...
Location
Location
Japan , Tokyo
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep understanding of the Japanese model evaluation landscape and familiarity with Japanese pre-training data sources
  • Experience using modeling and inference tools such as Huggingface inference, vLLM, and cloud APIs
Job Responsibility
Job Responsibility
  • Identify, collect, and curate diverse high-quality Japanese text, audio, and multimodal datasets
  • Design methods to synthetically generate or augment Japanese training data when needed
  • Ensure datasets meet enterprise-grade quality, coverage, and compliance requirements
  • Train and fine-tune language and vision models to achieve state-of-the-art performance for Japanese enterprise use cases
  • Adapt existing LFMs for Japanese language, culture, and enterprise-specific workflows
  • Implement evaluation frameworks to benchmark model quality on Japanese datasets
  • Design evaluation datasets and metrics for Japanese enterprise applications
  • Conduct thorough error analysis and iteratively improve model performance
  • Ensure robustness, fairness, and reliability in Japanese-language outputs
What we offer
What we offer
  • Hands-on experience with state-of-the-art technology at a leading AI company
  • The opportunity to directly shape foundation model performance in one of the world’s most complex and nuanced languages
  • A collaborative, fast-paced environment where your work drives the next generation of LFMs
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff - ML Research Engineer, Data

Our Data team powers Liquid Foundation Models across pre-training, vision, audio...
Location
Location
United States , San Francisco; Boston
Salary
Salary:
Not provided
liquid.ai Logo
Liquid AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python skills with the ability to quickly comprehend problems and translate them into clean, working code
  • Solid ML fundamentals: experience training, evaluating, and iterating on models (PyTorch preferred)
  • Track record of learning new technical domains quickly
  • 3+ years relevant experience with an M.S., or 1+ year with a Ph.D. (5+ years with a B.S.)
Job Responsibility
Job Responsibility
  • Build and maintain data processing, filtering, and selection pipelines at scale
  • Create pipelines for pretraining, midtraining, SFT, and preference optimization datasets
  • Design synthetic data generation systems using LLMs, structured prompting, and domain-specific generators
  • Design and run evaluations and ablations to measure dataset's impact on model performance
  • Monitor public datasets across text, vision, and audio domains
  • Collaborate with pre-training, vision, and audio teams on modality-specific data needs
What we offer
What we offer
  • Competitive base salary with equity in a unicorn-stage company
  • We pay 100% of medical, dental, and vision premiums for employees and dependents
  • 401(k) matching up to 4% of base pay
  • Unlimited PTO plus company-wide Refill Days throughout the year
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Next Generation Agents

Agentic LLM systems are being deployed widely across enterprise companies includ...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills
  • Proficiency in Python and have some experience with ML-related code (e.g., pytorch, numpy, etc.)
  • Experience with LLMs and agentic frameworks
  • Experience with post-training LLMs (SFT, PEFT, or RL*)
  • Experience with building synthetic data generation pipelines
Job Responsibility
Job Responsibility
  • Design and develop novel agentic solutions
  • Improve upon SOTA on hard agentic tasks
  • Research the next-generation of on-line learning-from-experience self-improvement
  • Work with partner teams (Reasoning, Post-training, Pre-training, etc.) to improve performance of agentic system
  • Work with an amazing team of researchers and engineers pushing the boundaries
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Agents Modeling

We’re looking for an experienced machine learning researcher / engineer who can ...
Location
Location
United States , New York
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have a PhD in computer science or related field or similar industry research experience
  • Strong software engineering skills
  • Proficiency in Python and experience with ML-related code (e.g., pytorch, numpy, etc.)
  • Experience with LLMs and agentic frameworks
  • Experience with post-training LLMs (SFT, PEFT, or RL*)
  • Experience with building synthetic data generation pipelines
Job Responsibility
Job Responsibility
  • Design and develop novel agentic solutions
  • Improve upon SOTA on hard agentic tasks
  • Research the next-generation of on-line learning-from-experience self-improvement
  • Work with partner teams (Reasoning, Post-training, Pre-training, etc.) to improve performance of agentic system
  • Work with an amazing team of researchers and engineers pushing the boundaries
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right
New

Pharmacy Technician

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Ocean View
Salary
Salary:
17.92 - 27.92 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
April 07, 2026
Flip Icon
Requirements
Requirements
  • Must comply with any state board of pharmacy requirements or laws governing the practice of pharmacy, which includes but is not limited to, age, education, and licensure/certification
  • If the state board of pharmacy does not address or mandate a minimum age requirement, must be at least 16 years of age
  • If the state board of pharmacy does not address or mandate a minimum educational requirement, must have a high school diploma or equivalent, or be actively enrolled in high school or high school equivalency program
  • Regular and predictable attendance, including nights and weekends
  • Ability to complete required training within designated timeframe
  • Attention and Focus
  • Customer Service and Team Orientation
  • Communication Skills
  • Mathematical Reasoning
  • Problem Resolution
Job Responsibility
Job Responsibility
  • Support the pharmacy team in delivering operational and service excellence
  • Assist the pharmacy team to ensure that pharmacy operations run smoothly, our patients’ prescriptions are filled promptly, safely, and accurately
  • Operate as part of the pharmacy team through consistent application of Standard Operating Procedures (SOPs), best practices, and effective communication
  • Following pharmacy workflow procedures at each pharmacy workstation (i.e., production, pick-up, drive-thru, and drop-off) for safe and accurate prescription fulfillment
  • Contributing to positive patient experiences by showing empathy and genuine care
  • Completing basic inventory activities, as permitted by law, and as directed by the pharmacy leadership team
  • Contributing to a high-performing team, embracing a growth mindset, and being receptive to feedback
  • Remaining flexible for both scheduling and business needs, while contributing to a safe, inclusive, and engaging team dynamic
  • Understanding and complying with all relevant federal, state, and local laws, regulations, professional standards, and ethical principles
  • Delivering additional patient health care services (e.g., immunizations, point-of-care testing, and voluntarily staffing offsite clinics), where allowable by law and supported by required training and certification
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Parttime
Read More
Arrow Right