CrawlJobs Logo

Data Curator and Annotator

· Job Posted December 14, 2025
Apply Position
Job Link Share

Job Description

The Data Curator and Annotator will be responsible for curating, labeling, and maintaining high-quality datasets to support ML training, RAG pipelines, and evaluation. This role requires precision in annotation, strong attention to detail, and the ability to establish reliable guidelines and workflows. The ideal candidate will collaborate closely with engineers and data scientists to ensure datasets are accurate, secure, and aligned with business and research needs.

Job Responsibility

  • Curate and label datasets for ML training and evaluation
  • Define annotation guidelines and quality control processes
  • Develop efficient labeling workflows with quality gates
  • Ensure privacy, security, and bias mitigation in datasets
  • Collaborate with engineers and data scientists to improve data utility
  • Build trusted evaluation datasets for ranking and RAG tasks

Requirements

  • Experience labeling or curating datasets for NLP or search
  • Familiarity with annotation tools such as Label Studio or Prodigy
  • Strong attention to detail and commitment to labeling consistency
  • Comfort working with enterprise domain data
  • Experience with QA processes for annotation quality
  • Strong written communication for guideline creation
  • Respect for privacy, security, and ethical data principles

Nice to have

  • Domain knowledge in BFSI, retail, or healthcare
  • Experience creating evaluation datasets for LLMs
  • Multi-lingual annotation experience
  • Comfort with basic Python scripting

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Data Curator and Annotator

8 matching positions

English Specialist

In this role, you'll apply your expertise to help train next-generation AI syste...
Location
Location
India , Noida
Salary
Salary:
Not provided
aqusag.com Logo
AquSag Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • C1 level English proficiency or native speaker with exemplary written and verbal communication skills
  • Degree in English, linguistics, or a related field
  • Proven experience in language editing, content creation, or linguistic analysis
  • Strong analytical skills and keen attention to linguistic detail
  • Ability to work independently in a remote, collaborative team environment
  • Passion for language, communication, and innovative technology
  • Demonstrated excellence in both written and verbal English communication
Job Responsibility
Job Responsibility
  • Analyze and evaluate English language content for accuracy, consistency, and naturalness
  • Develop, curate, and refine text datasets to enhance the linguistic capabilities of AI models
  • Collaborate with AI trainers and engineers to provide expert feedback on language use, grammar, and context
  • Design tasks and guidelines for language annotation, ensuring quality and consistency across projects
  • Create written resources and best practice documentation for linguistic training
  • Participate in team discussions to identify language-related challenges and propose effective solutions
  • Proofread, edit, and optimize written materials to meet high communication standards
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Microsoft Robotics (Spatial AI)

Microsoft’s Discovery and Quantum (MDQ) division develops and delivers advanced ...
Location
Location
United States , Redmond
Salary
Salary:
102100.00 - 202200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design, develop, and evaluate physical world models that capture 3D spatial structure, object geometry and pose, physics dynamics, material properties, and semantic scene understanding for robotic applications
  • Build and train world models (e.g., video prediction models, neural physics simulators, 3D generative models, scene graph representations) that predict future states of physical environments conditioned on robot actions, enabling model-based planning and policy learning
  • Develop spatial AI capabilities including 3D scene reconstruction, object detection and pose estimation, spatial relationship reasoning, occupancy prediction, and dense 3D feature representations for robot perception and planning
  • Implement and maintain evaluation frameworks for world models and spatial AI systems, including prediction accuracy metrics, planning performance benchmarks, and generalization testing across environments and object categories
  • Collaborate with robotics researchers, learning engineers, and simulation engineers to integrate world models into robot planning and control pipelines, enabling model-predictive control, imagination-based planning, and data-augmented training
  • Build data pipelines for training world models, including multi-sensor data fusion (RGB, depth, LiDAR, proprioception), scene annotation, and dataset curation for diverse physical environments and interaction scenarios
  • Write efficient, readable, extensible code in Python (including PyTorch, JAX, or TensorFlow) for model development, training, and evaluation, leveraging GPU computing infrastructure for large-scale experiments
  • Contribute to the formulation of the team's world modeling research and development roadmap, identifying high-impact technical directions and collaborating with leadership to prioritize investments
  • Present research findings and model evaluation results clearly and efficiently to internal stakeholders and external partners, contributing to technical publications, blog posts, and conference presentations
  • Stay current with state-of-the-art research in world models, spatial AI, 3D vision, neural physics simulation, and foundation models for physical understanding, actively contributing to the body of thought leadership in these areas
  • Fulltime
Read More
Arrow Right

Senior Associate, Data Scientist - US Card (Applied GenAI)

Senior Associate, Data Scientist - US Card (Applied GenAI). Data is at the cente...
Location
Location
United States , McLean, Virginia; New York, New York
Salary
Salary:
135600.00 - 168900.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining one of the following with an expectation that the required degree will be obtained on or before the scheduled start date: A Bachelor's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) plus 2 years of experience performing data analytics
  • A Master's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) or an MBA with a quantitative concentration
Job Responsibility
Job Responsibility
  • Apply expertise in unstructured data (text, image) to harness the power of open source large language models (LLMs) and visual language models (VLMs)
  • Leverage a broad stack of technologies — LangGraph, LlamaIndex, Weights and Biases Weave, Hugging Face, PyTorch, AWS, and more — to automate workflows using huge volumes of text and vision data
  • Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation
  • partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers
  • Assessing GenAI or LLM-Powered application architectures in production, including best practices for Generative AI development and deployments
  • Define requirements for AI observability, focusing on the traceability of autonomous decisions and comprehensive system audit trails
  • Evaluate the dynamic behavior of AI systems and oversee the development of key continuous monitoring controls and testing, ensuring that non-deterministic outputs and autonomous actions remain within risk appetite
  • Get into the weeds of internal business processes and data operations by guiding annotators to curate high quality, consistent datasets for model training, evaluation, and ongoing AI monitoring
  • Collaborate on a team of data scientists through all phases of project development, from design through training, evaluation, validation, implementation, and maintenance
  • Interact with a variety of internal stakeholders to ensure the alignment of data science solutions with business outcomes
What we offer
What we offer
  • Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits
  • Fulltime
Read More
Arrow Right

Janitorial Supervisor

Join our team as a Janitorial Supervisor, where you will bridge industry experti...
Location
Location
India , Noida
Salary
Salary:
Not provided
aqusag.com Logo
AquSag Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • supervisory experience in housekeeping or janitorial services
  • data annotation
  • digital workflow documentation
  • process optimization
  • quality assurance
  • written and verbal communication
  • remote team collaboration
  • project management tools
  • analytical skills
  • safety and compliance standards
Job Responsibility
Job Responsibility
  • Collaborate with AI development teams to annotate, curate, and evaluate data for training advanced AI models in housekeeping and janitorial contexts
  • Utilize your supervisory experience to design realistic workflows and scenario-based data sets for machine learning purposes
  • Provide detailed feedback to AI trainers and engineers on emerging trends, best practices, and frontline challenges in cleaning and facility maintenance supervision
  • Ensure that AI models accurately reflect the nuances of task delegation, quality control, and safety compliance within housekeeping operations
  • Engage in regular review sessions to refine AI behavior, ensuring it aligns with industry standards and customer requirements
  • Document processes and communicate complex concepts clearly in both written and verbal formats to cross-functional teams
  • Champion the integration of practical supervisory knowledge into technological solutions to drive operational excellence
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence (MAIST) Post Training team is dedicated to ad...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate OR equivalent experience
  • Significant experience in large-scale model training, data curation, and hands-on coding, ideally from leading research labs
  • Deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Ability to develop LLMs, SLMs, multimodal, and coding models using both proprietary and open-source frameworks
  • Self-driven, able to write efficient code and debug training jobs, document findings, and demonstrate a track record in these fields
  • Curious, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff

The Microsoft AI Superintelligence (MAIST) Post Training team is dedicated to ad...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in relevant field AND 1+ year(s) related research experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models
  • run ablation studies to measure impact and optimize data effectiveness
  • Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
  • Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
  • Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance
  • identify gaps and propose improvements
  • Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
  • Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Senior Lead Machine Learning Engineer

This role combines hands-on machine learning engineering with technical leadersh...
Location
Location
Australia , Brisbane
Salary
Salary:
150000.00 - 190000.00 AUD / Year
reqiva.com Logo
Reqiva
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Engineering, Data Science, or equivalent experience
  • Strong experience developing production machine learning systems
  • Hands-on expertise with deep learning frameworks such as PyTorch
  • Experience developing computer vision models
Job Responsibility
Job Responsibility
  • Design, train, and evaluate computer vision models including object detection, classification, and segmentation
  • Conduct structured experimentation and performance analysis to guide model architecture decisions
  • Optimise models for edge deployment across embedded hardware platforms
  • Convert and deploy models using frameworks such as TensorRT, ONNX, or TFLite
  • Support development of data pipelines, dataset curation, and annotation workflows
  • Benchmark and tune model performance for latency, accuracy, and operational constraints
  • Contribute to ML assurance practices including validation, benchmarking, and regression testing
  • Mentor engineers and contribute to the growth of the ML capability
  • Fulltime
Read More
Arrow Right

Senior Legal Engineer

This role sits at the intersection of legal practice and legal technology and pl...
Location
Location
United States , Oakland
Salary
Salary:
163000.00 - 206000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of relevant legal experience, JD preferred
  • Exceptional written communication and a strong interest in the application of AI in the field of law
  • Familiarity with a coding/scripting language or coding agents (e.g., gemini-cli, claude code, codex-cli)
  • Experience with generative AI in legal work, including evaluation of outputs, design/interpretation of metrics, and abstracting legal reasoning into repeatable steps
Job Responsibility
Job Responsibility
  • Prompt Optimization & Workflow Development: Systematically discover and test prompt engineering best practices
  • Drive product enhancements by translating legal expertise into features
  • Collaborate with Engineering on LLM calls
  • Review and validate product outputs for quality
  • Translate client requests into concrete, testable prompts and evaluation criteria
  • Draft and iterate on system/developer prompts for AI systems
  • Benchmark Design and Evaluation: Assess foundational model performance and suitability for various legal tasks
  • Collaborate with Product and customers to optimize genAI legal workflows
  • Develop datasets that accurately model legal tasks
  • Lead the creation of cutting-edge legal AI evaluations
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Monthly home internet reimbursement
  • Fulltime
Read More
Arrow Right