CrawlJobs Logo

Member of Technical Staff, Machine Learning Datasets

United States 270000.00 - 370000.00 USD / Year · Job Posted January 10, 2026
Apply Position
Job Link Share

Job Description

We are building AI to simulate the world through merging art and science. We believe that world models are at the frontier of progress in artificial intelligence. Language models alone won’t solve the world’s hardest problems – robotics, disease, scientific discovery. Real progress requires models that experience the world and learn from their mistakes, the same way that humans do. And this kind of trial and error can be massively accelerated when done in simulation, rather than in the real world. World models offer the most clear path to general-purpose simulation, changing how stories are told, how scientific progress is made and how the next frontiers of humanity are reached.

Job Responsibility

  • Develop and maintain large-scale, multimodal datasets for training and evaluating models
  • Optimize models for data preprocessing tasks
  • Create and run evaluations and benchmark analyses for datasets and models
  • Implement fast iteration cycles and feedback loops to continuously improve model datasets
  • Work with a world-class research team to push the boundaries of content creation
  • Evaluate new datasets and models for upstream data tasks that feed into our products

Requirements

  • 4+ years of relevant experience in machine learning or dataset engineering, ideally with multimodal datasets
  • Experience with running and optimizing models offline at large scale
  • Excellent data modeling skills and experience with data curation
  • Proficiency in model finetuning and optimization for data preprocessing
  • Strong data analysis and SQL skills
  • Experience in creating evaluations and running benchmark analyses
  • Solid knowledge of at least one machine learning framework (e.g. PyTorch, JAX, TensorFlow)
  • Very strong programming skills and ability to write clean and maintainable code
  • Deep interest in building human-in-the-loop systems for creativity
  • Ability to rapidly prototype solutions and iterate on them with tight product deadlines
  • Strong familiarity with tools such as Ray, Kubernetes, Airflow, Prefect
  • Excellent communication, collaboration, and documentation skills

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Member of Technical Staff, Machine Learning Datasets

8 matching positions

Member of Technical Staff, Microsoft Robotics (Robot Learning)

Microsoft's Discovery and Quantum (MDQ) division develops and delivers advanced ...
Location
Location
United States , Redmond
Salary
Salary:
102100.00 - 202200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Develop and train end-to-end robot learning models, including vision-language-action (VLA) family of models, imitation learning policies, and reinforcement learning agents for manipulation, locomotion, and navigation tasks
  • Build, maintain, and optimize data pipelines for robot learning, including collection infrastructure for teleoperation demonstrations, data preprocessing, augmentation, quality filtering, and dataset versioning
  • Train machine learning and deep learning models on GPU computing clusters, implementing distributed training, hyperparameter optimization, curriculum learning, and training infrastructure automation
  • Deploy trained models to physical robot platforms, conducting real-world evaluation, debugging sim-to-real transfer issues, and iterating on model performance based on deployment feedback
  • Implement and maintain evaluation frameworks for robot learning models, including standardized task benchmarks, success rate tracking, generalization testing across objects and environments, and regression detection
  • Collaborate with robotics researchers, simulation engineers, and platform engineers to improve the end-to-end model development lifecycle, from data collection through deployment and monitoring
  • Write production-quality code in Python (including NumPy, PyTorch, JAX) that is well-tested, maintainable, and extensible, adhering to team coding standards and best practices
  • Review code and technical designs, providing feedback to develop other engineers' skills and drive adherence to coding patterns, security practices, and engineering excellence standards
  • Stay current with state-of-the-art research in robot learning, foundation models for robotics, and physical AI, evaluating new model technologies and techniques for adoption and integration into the platform
  • Contribute to internal knowledge sharing through technical documentation, brown bag sessions, blog posts, and mentoring of team members
  • Fulltime
Read More
Arrow Right

Member Of Technical Staff, Microsoft Robotics (Robotics Data)

Microsoft’s Discovery and Quantum (MDQ) division develops and delivers advanced ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience
  • OR Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Define and implement data collection strategies for robot learning, including specifying demonstration coverage requirements, environmental diversity targets, task distribution plans, and quality acceptance criteria for teleoperation, egocentric, and autonomous data collection campaigns
  • Build and maintain data curation pipelines that ingest, clean, validate, label, and version robotics datasets (manipulation demonstrations, navigation trajectories, sensor logs, simulation rollouts), ensuring data integrity and provenance tracking
  • Develop data analysis frameworks that quantify dataset characteristics (coverage, diversity, balance, quality scores), identify data gaps and biases, and provide recommendations for targeted data collection to improve model performance
  • Create interactive data visualization tools and dashboards (using tools such as Power BI, Plotly, or custom web applications) that enable researchers, engineers, and leadership to explore dataset properties, model training metrics, evaluation results, and fleet operational telemetry
  • Collaborate with ML researchers and learning engineers to design and execute experiments that measure the impact of data quantity, quality, and diversity on model performance, producing statistical analyses that guide data investment decisions
  • Formulate and maintain a roadmap of data science project activity that leads to measurable improvement in model performance metrics, data pipeline efficiency, and data quality over time
  • Develop and apply statistical techniques (hypothesis testing, causal inference, regression analysis, clustering) to analyze robot performance data, identify failure modes, and uncover patterns that inform model architecture and training strategy decisions
  • Write efficient, readable, extensible code in Python (including Pandas, NumPy, scikit-learn, matplotlib) for data processing, analysis, and visualization, building professional-grade documentation for knowledge transfer
  • Adhere and contribute to ethics and privacy policies related to collecting and preparing robotics data, providing guidance on responsible data practices including bias detection, consent, and data governance
  • Present results and findings to senior stakeholders, using compelling visualizations and storytelling to influence data investment priorities and model development strategy
What we offer
What we offer
  • Benefits and other compensation may be eligible
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Microsoft Robotics (Software Systems)

Microsoft's Discovery and Quantum (MDQ) division develops and delivers advanced ...
Location
Location
United States , Redmond
Salary
Salary:
142800.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Architect and implement core platform components, including robotics SDKs, cloud-hosted Application Programming Interfaces (APIs), edge runtimes, and agent orchestration frameworks that enable developers and partners to compose interoperable autonomy capabilities (perception, planning, control, multi-agent coordination) into deployable mission workflows
  • Design the platform's extensibility and integration architecture, defining how first-party autonomy capabilities, first- and third-party models, partner hardware systems, and customer-specific logic are composed, versioned, tested, and deployed across cloud and edge environments
  • Build production-grade data infrastructure spanning the full robotics lifecycle including instrumentation libraries, data acquisition services, human-in-the-loop workflows, dataset versioning and curation pipelines, and data quality governance supporting both real-world and synthetic/simulated data at scale
  • Own cross-cutting platform concerns including authentication and authorization across cloud-edge boundaries, API versioning and backward compatibility, multi-tenant isolation, and performance at the latencies required by real-time robotic control loops
  • Drive the developer experience for the Microsoft Robotics platform, to include defining the Command Line Interface (CLI), SDK patterns, documentation strategy, sample code, and inner-loop development workflow that make it fast and reliable for internal teams and external partners to build on the platform
  • Collaborate with autonomy, simulation, and evaluation teams to ensure that platform primitives (compute orchestration, data routing, model serving, experiment tracking) meet the performance, reliability, and reproducibility requirements of Machine Learning (ML) training, sim-to-real transfer, and online evaluation workloads
  • Lead technical design reviews, write architecture decision records, and establish engineering practices for the platform team, mentoring senior engineers and raising the bar for code quality, testing, and operational readiness across the organization.
What we offer
What we offer
  • Benefits and other compensation may be eligible
  • Find additional benefits and pay information at the provided link.
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence (MAIST) Post Training team is dedicated to ad...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate OR equivalent experience
  • Significant experience in large-scale model training, data curation, and hands-on coding, ideally from leading research labs
  • Deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Ability to develop LLMs, SLMs, multimodal, and coding models using both proprietary and open-source frameworks
  • Self-driven, able to write efficient code and debug training jobs, document findings, and demonstrate a track record in these fields
  • Curious, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence (MAIST) Post Training team is dedicated to ad...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in relevant field AND 1+ year(s) related research experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Post-Training

This Microsoft AI Superintelligence Post-Training team is dedicated to advancing...
Location
Location
United States , Redmond
Salary
Salary:
84200.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree (complete or in progress) in relevant field AND 3+ months related research internship experience OR Master's Degree in relevant field OR equivalent experience
  • Software engineering skills with fluency in Python and modern data libraries
  • The ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Post-Training

This Microsoft AI Superintelligence Post-Training team is dedicated to advancing...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field OR equivalent experience
  • Software engineering skills with fluency in Python and modern data libraries
  • The ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence Post Training team is dedicated to advancing ...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field AND 3+ years related research experience OR Master's Degree in relevant field AND 4+ years related research experience OR Bachelor's Degree in relevant field AND 6+ years related research experience OR equivalent experience
  • 5+ years of coding experience in Python and experience with ML frameworks such as PyTorch and Triton
  • 3+ years of experience in data curation and synthesis, creating and refining datasets to optimize training outcomes
  • 3+ years of proven ability to design and scale training infrastructure and pipelines in production environments
  • 3+ years of large-scale model training - especially with LLMs, SLMs, multimodal, or code-specific models
  • Prior research publication record with over 3000 citations
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Perform large-scale model training - Especially with LLMs, SLMs, multimodal, or code-specific models
  • Perform data curation and synthesis - Creating and refining datasets to optimize training outcomes
  • Hands-on coding- write efficient, production-quality code and debug complex training jobs
  • Work on both proprietary and open-source frameworks - Demonstrated proficiency in training pipelines and architecture
  • Full-stack modeling responsibility - From data ingestion and training to evaluation and inference management
  • Contribute to or build on existing innovations like technical report of the well-known models
  • Develop novel AI solutions that bridge language, vision, and code understanding
  • Help develop models powering tools like GitHub Copilot, Cursor, and VS Code suggestions
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right