CrawlJobs Logo

ML Infra Engineer (Data Systems)

United States, San Francisco · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

As an ML Infra Engineer (Data Systems), you’ll build and operate the data infrastructure that powers large-scale robot learning. Your systems will sit directly between raw data sources and training/evaluation, enabling us to move faster while maintaining performance, correctness, and reliability at scale. This is a systems role at the intersection of distributed systems, storage, and machine learning infrastructure.

Job Responsibility

  • Data Ingestion & Processing: Design and build high-throughput pipelines that validate, transform, and featurize raw multimodal data
  • Batch & Streaming Systems: Operate large-scale batch and streaming workflows over massive datasets
  • Storage Systems: Design object storage layouts, metadata systems, and efficient access patterns
  • choose file formats with performance and scalability in mind
  • Data Lifecycle Management: Build systems for backfills, dataset rebuilds, garbage collection, and large-scale transformations
  • Training-Time Performance: Optimize dataloaders, sharding, prefetching, caching, and throughput to reduce time from data arrival → model training
  • Metadata & Indexing: Build scalable metadata stores for datasets, annotations, and training artifacts
  • Data Movement: Move hundreds of terabytes to petabytes efficiently across clusters and environments
  • Operational Correctness: Implement observability, validation, and guardrails to prevent silent data regressions
  • Cross-Functional Collaboration: Work closely with cross-functional teams of researchers, engineers and roboticists to translate evolving data needs into robust systems

Requirements

  • Strong software engineering fundamentals
  • Experience building distributed systems or large-scale data pipelines
  • Comfort reasoning about performance, memory, I/O, and storage efficiency
  • Familiarity with batch and/or streaming processing systems
  • Experience with object storage systems and data format tradeoffs
  • Ownership mindset: design, build, operate, and iterate on systems end-to-end
  • Enjoy working closely with researchers and unblocking fast-moving projects

Nice to have

  • Experience with large ML training pipelines or dataloading systems
  • Knowledge of columnar or custom data formats
  • Experience with systems like ClickHouse, Ray, Flink, Spark, or similar
  • Hands-on experience operating petabyte-scale datasets
  • Debugging and fixing performance bottlenecks in data-heavy systems

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

ML Infra Engineer (Data Systems)

8 matching positions

Data Engineer

Influur is redefining how advertising works, through creators, data, and AI. Our...
Location
Location
Mexico , Mexico City
Salary
Salary:
Not provided
influur.com Logo
Influur
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming with Python and SQL
  • Comfortable building from scratch and improving existing code
  • Expertise in data modeling and warehousing, including dimensional modeling and performance tuning
  • Experience designing and operating ETL and ELT pipelines with tools like Airflow or Dagster, plus dbt for transformations
  • Hands-on with batch and streaming systems and with Lakehouse or warehouse tech on AWS or GCP
  • Proficiency integrating third-party APIs and datasets, ensuring reliability, lineage, and governance
  • Familiarity with AI data needs: feature stores, embedding pipelines, vector databases, and feedback loops that close the gap between model and outcome
  • High standards for code quality, testing, observability, and CI
  • Comfortable with Docker and modern cloud infra
Job Responsibility
Job Responsibility
  • Treats data as a product and ships improvements that users feel
  • Moves fast without breaking trust
  • Owns problems across the stack, from ingestion to modeling to serving
  • Communicates clearly with ML engineers, analysts, and business partners
  • Experiments, measures, and iterates
  • Sees ambiguity as a chance to design the standard everyone else will follow
What we offer
What we offer
  • Competitive equity in a venture-backed company
  • Opportunities to grow and develop
  • Remote work
  • Fulltime
Read More
Arrow Right

Data Engineer

Influur is redefining how advertising works, through creators, data, and AI. Our...
Location
Location
Colombia , Bogotá
Salary
Salary:
Not provided
influur.com Logo
Influur
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming with Python and SQL
  • Comfortable building from scratch and improving existing code
  • Expertise in data modeling and warehousing, including dimensional modeling and performance tuning
  • Experience designing and operating ETL and ELT pipelines with tools like Airflow or Dagster, plus dbt for transformations
  • Hands-on with batch and streaming systems and with Lakehouse or warehouse tech on AWS or GCP
  • Proficiency integrating third-party APIs and datasets, ensuring reliability, lineage, and governance
  • Familiarity with AI data needs: feature stores, embedding pipelines, vector databases, and feedback loops that close the gap between model and outcome
  • High standards for code quality, testing, observability, and CI
  • Comfortable with Docker and modern cloud infra
Job Responsibility
Job Responsibility
  • Treats data as a product and ships improvements that users feel
  • Moves fast without breaking trust. You value contracts, schemas, and backward compatibility
  • Owns problems across the stack, from ingestion to modeling to serving
  • Communicates clearly with ML engineers, analysts, and business partners
  • Experiments, measures, and iterates. You set measurable SLAs and keep them green
  • Sees ambiguity as a chance to design the standard everyone else will follow
What we offer
What we offer
  • Competitive equity in a venture-backed company
  • Opportunities to grow and develop
  • Remote work
  • Fulltime
Read More
Arrow Right

Data Engineer

Influur is redefining how advertising works, through creators, data, and AI. Our...
Location
Location
Argentina , Buenos Aires
Salary
Salary:
Not provided
influur.com Logo
Influur
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming with Python and SQL
  • Comfortable building from scratch and improving existing code
  • Expertise in data modeling and warehousing, including dimensional modeling and performance tuning
  • Experience designing and operating ETL and ELT pipelines with tools like Airflow or Dagster, plus dbt for transformations
  • Hands-on with batch and streaming systems and with Lakehouse or warehouse tech on AWS or GCP
  • Proficiency integrating third-party APIs and datasets, ensuring reliability, lineage, and governance
  • Familiarity with AI data needs: feature stores, embedding pipelines, vector databases, and feedback loops that close the gap between model and outcome
  • High standards for code quality, testing, observability, and CI
  • Comfortable with Docker and modern cloud infra
Job Responsibility
Job Responsibility
  • Treats data as a product and ships improvements that users feel
  • Moves fast without breaking trust. You value contracts, schemas, and backward compatibility
  • Owns problems across the stack, from ingestion to modeling to serving
  • Communicates clearly with ML engineers, analysts, and business partners
  • Experiments, measures, and iterates. You set measurable SLAs and keep them green
  • Sees ambiguity as a chance to design the standard everyone else will follow
What we offer
What we offer
  • Competitive equity in a venture-backed company
  • Opportunities to grow and develop
  • Remote work
  • Fulltime
Read More
Arrow Right

Principal Engineer - Marketplace

Principal Engineer role in the Marketplace Engineering team to lead breakthrough...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
302000.00 - 336000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Machine Learning, Operations Research, or related quantitative field OR Master’s degree with 12+ years of industry experience
  • 10+ years of experience building and deploying ML models in large-scale production environments
  • Expert-level proficiency in modern ML frameworks (TensorFlow, PyTorch, JAX) and distributed computing platforms (Spark, Ray)
  • Deep expertise across multiple areas including: Deep Learning, Causal Inference, Reinforcement Learning, Multi-objective Optimization, Algorithmic Game Theory, and Large-scale Ads Ranking/Auction Systems
  • Proven track record of leading complex ML projects from research through production with significant measurable business impact
  • Strong programming skills in Python, Java, or Go with experience building production ML systems
  • Experience with feature engineering, model serving, and ML infrastructure at scale (handling millions of predictions per second)
  • Technical leadership experience including mentoring senior engineers and driving cross-team technical initiatives
  • Advanced Deep Learning and Neural Network architectures
  • Scalable ML architecture and distributed model training
Job Responsibility
Job Responsibility
  • Lead the design and implementation of advanced ML systems for dynamic pricing algorithms serving millions of drivers across 70+ countries around the world
  • Architect real-time ML infrastructure handling 1M+ pricing decisions per second with sub-50ms latency requirements
  • Drive breakthrough research in causal ML, reinforcement learning, algorithmic game theory, and multi-objective optimization for marketplace optimization with strategic agents
  • Own end-to-end ML model lifecycle from research through production deployment and continuous optimization
  • Develop and enforce best practices in system design, ensuring data integrity, security, and optimal performance
  • Serve as a representative for the Marketplace organization to the broader internal and external technical community
  • Contribute to the eng brand for Marketplace and serve as a talent magnet to help attract and retain talent for the team
  • Stay abreast of industry trends and emerging technologies in software engineering, focused particularly on ML/AI, to enhance our systems and processes continually
  • Build scalable ML architecture and feature management systems supporting Driver Pricing and broader Marketplace teams
  • Design experimentation frameworks enabling rapid testing of pricing algorithms using A/B, Switchback, Synthetic Control, and other experimental methodologies
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible to participate in a 401(k) plan
  • Eligible for various benefits (details at provided link)
  • Fulltime
Read More
Arrow Right

Senior AI Data Engineer

We are looking for a Senior AI Data Engineer to join a high-impact AI product in...
Location
Location
United States
Salary
Salary:
Not provided
velvetech.com Logo
Velvetech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Data Engineering / ML Engineering / AI Engineering
  • Strong programming skills in Python
  • Hands-on experience with PyTorch (training and deploying deep learning models)
  • Experience working with Vertex AI or similar ML platforms (GCP preferred)
  • Proven experience with vector databases (Milvus, Pinecone, or similar)
  • Strong knowledge of: Feature engineering techniques, Model evaluation and validation frameworks, Predictive inference systems
  • Experience with multiple database paradigms: Relational (PostgreSQL), Time-series (InfluxDB), Graph (Neo4j)
  • Solid understanding of embeddings and semantic/vector search systems
  • Experience implementing model lifecycle management, including: Drift detection, Monitoring, Governance
  • Strong understanding of scalable system design and performance optimization
Job Responsibility
Job Responsibility
  • Own and develop the biometric extraction model lifecycle (training, validation, deployment)
  • Design and maintain a vector memory layer using tools such as Milvus or Pinecone
  • Build and optimize predictive inference services for real-time and batch use cases
  • Develop and maintain data pipelines for PFM (Personal Financial Management) data preparation
  • Implement advanced feature engineering frameworks and model evaluation pipelines
  • Work with Vertex AI for model training, deployment, and orchestration
  • Manage and integrate heterogeneous data storage systems: InfluxDB (time-series data), PostgreSQL (relational data), Neo4j (graph data)
  • Develop vector embeddings pipelines and similarity search logic
  • Implement model governance processes: Drift detection and monitoring, Shadow-mode validation, Performance tracking and reporting
  • Design and apply optimization policies for inference latency, cost, and accuracy
What we offer
What we offer
  • FLEXIBLE working conditions
  • COOPERATIVE environment
  • Competitive salary
  • Many CHALLENGING and exciting projects with new opportunities and learning
  • GROWTH opportunities, skills and competencies improvement, and professional certification
  • In-company TRAINING (English, Software / DevOps / Project management / Design / Business)
  • Fulltime
Read More
Arrow Right

Staff Data Engineer

At Vanta, our mission is to help businesses earn and prove trust. We believe tha...
Location
Location
United States
Salary
Salary:
213000.00 - 251000.00 USD / Year
vanta.com Logo
Vanta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have at least six years of experience working with data
  • Have at least two years of experience in Software Engineering or a related field
  • Have experience with common analytics tooling (e.g. Stitch/Fivetran, Snowflake/BigQuery/Redshift, dbt, Airflow, Dagster, Looker/Mode/Sigma)
  • Have led an implementation of Debezium or another CDC event tracking system
  • Have good working knowledge of AWS data infra systems and Terraform
  • Bring a system-oriented and software engineering mindset to the Data Engineering practice
  • Deep knowledge of crafting dimensional and fact models in modern data fashion
  • Have a passion for enabling the developer experience of data
  • Desire to lead the industry in security, anonymization, and compliance management when it comes to data warehousing
  • Open to using AI to amplify their skills and strengthen their work
Job Responsibility
Job Responsibility
  • Design and implement complex data models, modeling metadata, building reports and dashboards and creating reporting tools for data science and ML products users
  • Design and deploy data infrastructure needed to drive data-driven decision-making solutions
  • Be the company’s expert on data administration and master data management
  • Be a technical thought leader on the development of scalable data systems
  • Develop front end applications to expose analytical data sets enterprise wide
  • Write highly tuned, scalable SQL queries running over large-scale, heterogeneous data warehouses
  • Work with the Product and Enterprise Engineering system teams to structure source systems for reporting consumption across the enterprise
  • Help maintain the CDC pipeline to power customer reporting
What we offer
What we offer
  • Offers Equity
  • Medical benefits
  • 401(k) plan
  • Other company perk programs
  • Comprehensive medical, dental, and vision coverage, with 100% of employee-only benefit premiums covered for most medical plans
  • 16 weeks fully-paid Parental Leave for all new parents
  • Health & wellness stipend
  • Remote workspace, internet, and cellphone stipend
  • Commuter benefits for team members who report to the SF and NYC office
  • Family planning benefits
  • Fulltime
Read More
Arrow Right

Machine Learning Data Engineer - Systems & Retrieval

As a Machine Learning Data Engineer - Systems & Retrieval, you will build and op...
Location
Location
United States , Palo Alto
Salary
Salary:
Not provided
zyphra.com Logo
Zyphra
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering background with fluency in Python
  • Experience designing, building, and maintaining data pipelines in production environments
  • Deep understanding of data structures, storage formats, and distributed data systems
  • Familiarity with indexing and retrieval techniques for large-scale document corpora
  • Understanding of database systems (SQL and NoSQL), their internals, and performance characteristics
  • Strong attention to security, access controls, and compliance best practices (e.g., GDPR, SOC2)
  • Excellent debugging, observability, and logging practices to support reliability at scale
  • Strong communication skills and experience collaborating across ML, infra, and product teams
Job Responsibility
Job Responsibility
  • Design and implementation of distributed data ingestion and transformation pipelines
  • Building retrieval and indexing systems that support RAG and other LLM-based methods
  • Mining and organizing large unstructured datasets, both in research and production environments
  • Collaborating with ML engineers, systems engineers, and DevOps to scale pipelines and observability
  • Ensuring compliance and access control in data handling, with security and auditability in mind
What we offer
What we offer
  • Comprehensive medical, dental, vision, and FSA plans
  • Competitive compensation and 401(k)
  • Relocation and immigration support on a case-by-case basis
  • On-site meals prepared by a dedicated culinary team
  • Thursday Happy Hours
  • Fulltime
Read More
Arrow Right

Senior CVML Platform Engineer

We are seeking a Senior CVML Platform Engineer to help design, build, and evolve...
Location
Location
United States
Salary
Salary:
160000.00 - 287000.00 USD / Year
bluerivertechnology.com Logo
Blue River Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional engineering experience, with a focus on platform, infrastructure, or systems engineering
  • Strong technical judgment, balancing the evolution of legacy platforms with the design and delivery of new, greenfield components shared across multiple teams and workloads
  • Excellent Python skills, used in production systems, tooling, and platform components
  • Solid understanding of ML systems and the end-to-end model development lifecycle, from experimentation to deployment and iteration
  • Hands-on experience or strong familiarity with cloud platforms (AWS preferred) and container orchestration systems such as Kubernetes and Slurm
  • Ability to partner effectively with ML engineers, infra teams, and product stakeholders to translate requirements into platform capabilities
  • Ability to quickly ramp up on new domains, tools, and complex existing systems
Job Responsibility
Job Responsibility
  • Design, build, and evolve platform capabilities that support ML training, batch inference, and model deployment workflows at scale
  • Own and improve core platform components (e.g., compute orchestration, data pipelines, inference systems) used by multiple teams across Blue River and John Deere
  • Continuously enhance platform reliability, scalability, and performance, with a focus on real-world ML workloads
  • Enable ML engineers to move faster by building intuitive, well-documented platform tools and workflows across the model lifecycle (experimentation, deployment, and iteration)
  • Improve model inference performance and throughput while balancing trade-offs among cost, latency, and reliability
  • Support and scale distributed training and inference systems, including frameworks such as Ray and related tooling
  • Develop and optimize hybrid compute environments (cloud + on-prem/GPU infrastructure) to support large-scale ML workloads
  • Build and maintain infrastructure leveraging Kubernetes, Slurm, and cloud platforms (AWS preferred)
  • Identify and resolve bottlenecks in compute, storage, and data movement pipelines
  • Evaluate existing platform systems and make thoughtful decisions on when to extend, refactor, or rebuild components
What we offer
What we offer
  • bonus and benefit programs
  • Fulltime
Read More
Arrow Right