CrawlJobs Logo

Research Engineer, Data Infrastructure

United States, Palo Alto Employment contract 180000.00 - 250000.00 USD / Year · Job Posted December 01, 2025
Apply Position
Job Link Share

Job Description

As a Research Engineer in Data Infrastructure, you will design and implement a “data engine” that uploads the data collected by the robot fleet, makes this data easy to query and train on. Your work ensures high‑quality data pipelines are built and maintained, enabling rapid model development, large‑scale annotation, and smooth integration between on‑robot, on‑premise, and cloud systems.

Job Responsibility

  • Optimize operational efficiency of data collection on the NEO fleet
  • Design triggers on the robot to determine if and when data should be uploaded
  • Automate ETL pipelines so fleet‑wide data is easily queryable and available for training
  • Work with external dataset providers to prepare diverse multi-modal pre-training datasets
  • Build frontend tools for visualizing and automating labeling of very large datasets
  • Develop machine learning models to automatically label and organize datasets

Requirements

  • Strong experience in building data pipelines and ETL systems
  • Ability to design and implement systems that collect, upload, and manage data from robotic fleets
  • Familiarity with architectures combining on‑robot components, on‑premises clusters, and cloud systems
  • Experience with data labeling tools or building tooling for dataset visualization and annotation
  • Skills in creating or applying machine learning models for dataset organization / automated labeling

What we offer

  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Engineer, Data Infrastructure

8 matching positions

AI Research Engineer, Data Infrastructure

As a Research Engineer in Infrastructure, you will design and implement a robust...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in building data pipelines and ETL systems
  • Ability to design and implement systems for data collection and management from robotic fleets
  • Familiarity with architectures that span on-robot components, on-premise clusters, and cloud infrastructure
  • Experience with data labeling tools or building dataset visualization and annotation tooling
  • Proficiency in creating or applying machine learning models for dataset organization and automated labeling
Job Responsibility
Job Responsibility
  • Optimize operational efficiency of data collection across the NEO robot fleet
  • Design intelligent triggers to determine when and what data should be uploaded from the robots
  • Automate ETL pipelines to make fleet-wide data easily queryable and training-ready
  • Collaborate with external dataset providers to prepare diverse multi-modal pre-training datasets
  • Build frontend tools for visualizing and automating the labeling of large datasets
  • Develop machine learning models for automatic dataset labeling and organization
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

AI Research Engineer, Data Infrastructure

As a Research Engineer in Infrastructure, you will design and implement a robust...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in building data pipelines and ETL systems
  • Ability to design and implement systems for data collection and management from robotic fleets
  • Familiarity with architectures that span on-robot components, on-premise clusters, and cloud infrastructure
  • Experience with data labeling tools or building dataset visualization and annotation tooling
  • Proficiency in creating or applying machine learning models for dataset organization and automated labeling
Job Responsibility
Job Responsibility
  • Optimize operational efficiency of data collection across the NEO robot fleet
  • Design intelligent triggers to determine when and what data should be uploaded from the robots
  • Automate ETL pipelines to make fleet-wide data easily queryable and training-ready
  • Collaborate with external dataset providers to prepare diverse multi-modal pre-training datasets
  • Build frontend tools for visualizing and automating the labeling of large datasets
  • Develop machine learning models for automatic dataset labeling and organization
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

Software Engineer, Data Infrastructure - Research

The Workload team is responsible for designing and running OpenAI’s LLM training...
Location
Location
United States , San Francisco
Salary
Salary:
250000.00 - 380000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure
  • Experience building APIs, modular code, and scalable abstractions
  • Comfortable debugging bottlenecks across large fleets of machines
  • Pride in building infrastructure that 'just works'
  • Collaborative, humble, and excited to own a foundational part of the ML stack
Job Responsibility
Job Responsibility
  • Design and implement the dataset infrastructure that powers OpenAI’s next-generation training stack
  • Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory
  • Build proactive testing and scale validation pipelines for dataset loading at GPU scale
  • Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines
  • Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt
  • Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized
  • Debug and resolve performance bottlenecks in distributed dataset loading
  • Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Research Engineer, Text Data Research - MSL FAIR

Meta is seeking AI research engineers to help us build the data foundation for M...
Location
Location
United States , Menlo Park
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry research experience in LLM/NLP or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross-functional impact, and/or influencing strategy across multiple teams
  • Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs
  • Demonstrated data infrastructure and software background, and experience building data tooling and services
  • Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Research Engineer, Media Data Research - MSL FAIR

Meta is seeking AI research engineers to help us build the data foundation for M...
Location
Location
United States , Menlo Park
Salary
Salary:
217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 1+ year of industry research experience in LLM/LMM, computer vision, or related AI/ML models
  • Experience owning and/or driving complex technical projects from end-to-end
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
  • Demonstrated data infrastructure and software background, and experience building data tooling and services
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Research Engineer, Media Data Research - MSL FAIR

Meta is seeking AI research engineers to help us build the data foundation for M...
Location
Location
United States , Menlo Park
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry research experience in LLM/NLP, computer vision, or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross-functional impact, and/or influencing strategy across multiple teams
  • Practical experience with multimodal pre-training or mid-training data curation for large media perception or generation models
  • Demonstrated data infrastructure and software background, and experience building data tooling and services
  • Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s next foundational models
  • Architect efficient and scalable data curation systems and pipelines
  • Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
  • Execute on high priority projects in pre-training, mid-training, or post-training data curation
  • Apply specialized expertise in video/image generation, video/image perception, OCR, data scaling laws, or data mixing
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Data Research Engineer

Fundamental is an AI company pioneering the future of enterprise decision-making...
Location
Location
Spain , Barcelona
Salary
Salary:
Not provided
Fundamental
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with: Identifying good data sources to train and evaluate ML models, including real-world and realistic synthetic data sources
  • Bringing data from structured and unstructured sources, as well as simulators and causal models, into formats accessible by ML models
  • Strong fundamentals of software engineering
  • Strong knowledge of: Python
  • Python data processing stack (numpy, pandas, …)
  • Familiarity with: distributed processing (e.g. Ray, Dask Spark, Beam)
  • data storage solutions
  • Basic ML knowledge
Job Responsibility
Job Responsibility
  • Helping to identify, characterize and evaluate data sources, including realistic synthetic data generated from Structured Causal Models and physical / systems-based simulators
  • Building and maintaining ETL pipelines
  • Designing and implementing scalable, reliable data storage solutions
  • Collaborating with the rest of the research team to maintain a reliable, efficient training pipeline where data is a critical component
  • Collaborating with the wider engineering and infrastructure team
What we offer
What we offer
  • Competitive compensation with salary and equity
  • Comprehensive health coverage for you and your dependents
  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys
  • Relocation support for employees moving to join the team in one of our office locations
  • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action
  • Fulltime
Read More
Arrow Right

Software Engineer, Data Infrastructure

Data Platform at OpenAI owns the foundational data stack powering critical produ...
Location
Location
United States , San Francisco
Salary
Salary:
185000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years in data infrastructure engineering OR 4+ years in infrastructure engineering with a strong interest in data
  • Take pride in building and operating scalable, reliable, secure systems
  • Comfortable with ambiguity and rapid change
  • Intrinsic desire to learn and fill in missing skills
  • Strong talent for sharing learnings clearly and concisely with others
  • Supported Spark, Kafka, Flink, Airflow, Trino, or Iceberg as platforms
  • Well-versed in infrastructure tooling like Terraform
  • Experienced in debugging large-scale distributed systems
  • Excited about solving data infrastructure problems in the AI space
Job Responsibility
Job Responsibility
  • Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security
  • Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient
  • Accelerate company productivity by empowering your fellow engineers & teammates with excellent data tooling and systems
  • Collaborate with product, research and analytics teams to build the technical foundations capabilities that unlock new features and experiences
  • Own the reliability of the systems you build, including participation in an on-call rotation for critical incidents
  • Take full lifecycle ownership: architecture, implementation, production operations, and on-call participation
  • Scale and harden big data compute and storage platforms
  • Build and support high-throughput streaming systems
  • Build and operate low latency data ingestions
  • Enable secure and governed data access for ML and analytics
What we offer
What we offer
  • Offers Equity
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Fulltime
Read More
Arrow Right