CrawlJobs Logo

Ml / Ai Data Engineer

India · Job Posted May 16, 2026
Apply Position
Job Link Share

Job Description

We are looking for a highly skilled Senior ML / Data Pipeline Engineer who can translate complex machine learning and multimodal concepts into scalable, production-ready pipelines and workflows. This role focuses on building and optimising large-scale video and multimodal data systems, enabling high-throughput ingestion, processing, and model training across distributed cloud environments.

Job Responsibility

  • Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure
  • Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata)
  • Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference
  • Develop high-throughput backend systems for video ingestion from desktop and mobile platforms
  • Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation
  • Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability
  • Translate ML and multimodal research into scalable, production-grade cloud architectures
  • Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers
  • Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows

Requirements

  • 5+ years of experience in data engineering, ML pipelines, or distributed systems
  • Strong experience building scalable data pipelines for large datasets (video/audio preferred)
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP)
  • Experience working with GPU-based environments and distributed computing
  • Strong programming skills in Python, Scala, or similar languages
  • Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar)
  • Understanding of ML workflows, training pipelines, and inference systems
  • Experience designing fault-tolerant, high-availability systems
  • Strong knowledge of data storage systems (data lakes, object storage, distributed file systems)
  • Ability to handle high-throughput, large-scale data ingestion and processing

Nice to have

  • Experience with multimodal AI (video, audio, NLP) systems
  • Familiarity with annotation tools and data labeling workflows
  • Experience with containerization and orchestration (Docker, Kubernetes)
  • Knowledge of cost optimization strategies for large-scale cloud workloads

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ml / Ai Data Engineer

8 matching positions

Senior Staff Data Engineer- ML & AI Platform

At Marktplaats, data is at the heart of everything we do, but Intelligence is wh...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
adevinta.com Logo
Adevinta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience with a specific focus on the intersection of Data Engineering, MLOps, and AI Infrastructure
  • Deep knowledge of Spark internals, structured streaming, and performance tuning for large-scale data processing
  • Proven experience architecting end-to-end ML platforms for Traditional ML (Classic MLOps) while actively enabling the organization on Generative AI concepts
  • Strong background in building automated pipelines and ensuring system observability
  • Practical experience building infrastructure for Large Language Models, including managing the complexity of chaining models and tools
  • Solid experience serving models at low latency and high concurrency using containerized solutions
  • Ability to speak the language of AI/ML Engineers and effectively bridge the gap between experimental code and production systems
  • Expert level Python
  • Experience with PyTorch, Terraform, Terragrunt, Docker, Kubernetes, GitHub Actions, Datadog
  • Experience with Databricks AI Stack: MLflow, Mosaic AI, Unity Catalog, Feature Store, Databricks Model Serving, Vector Databases
Job Responsibility
Job Responsibility
  • Lead the evolution of our Machine Learning & AI Platform, designing the architecture for AI Agents and establishing patterns for Vector Databases
  • Act as a first mover: validate new Databricks features and integrate them into the platform
  • Write the guidelines for GenAI development, helping teams transition from notebook experiments to production-grade LLM applications
  • Design the Feature Store, manage the Model Registry, and set up the infrastructure for Vector Search and RAG (Retrieval Augmented Generation) workflows
  • Elevate the technical bar of the team, mentoring Staff and Senior engineers on design patterns, code quality, and architectural decisions
  • Translate complex requirements from ML Engineers and Data Scientists into robust engineering tickets and infrastructure roadmaps
What we offer
What we offer
  • An attractive Base Salary
  • Participation in our Short Term Incentive plan (annual bonus)
  • Work From Anywhere: Enjoy up to 20 days a year of working from anywhere
  • A 24/7 Employee Assistance Program for you and your family
  • Fulltime
Read More
Arrow Right

Senior Staff Data Engineer- ML & AI Platform

At Marktplaats, data is at the heart of everything we do, but Intelligence is wh...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
adevinta.com Logo
Adevinta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience with a specific focus on the intersection of Data Engineering, MLOps, and AI Infrastructure
  • Deep knowledge of Spark internals, structured streaming, and performance tuning for large-scale data processing
  • Proven experience architecting end-to-end ML platforms for Traditional ML (Classic MLOps) while actively enabling the organization on Generative AI concepts
  • Strong background in building automated pipelines and ensuring system observability
  • Practical experience building infrastructure for Large Language Models, including managing the complexity of chaining models and tools
  • Solid experience serving models at low latency and high concurrency using containerized solutions
  • Ability to speak the language of AI/ML Engineers and effectively bridge the gap between experimental code and production systems
Job Responsibility
Job Responsibility
  • Lead the evolution of our Machine Learning & AI Platform, designing the architecture for AI Agents and establishing patterns for Vector Databases
  • Act as a first mover, validate new Databricks features and integrate them into the platform
  • Write the guidelines for GenAI development, helping teams transition from notebook experiments to production-grade LLM applications
  • Design the Feature Store, manage the Model Registry, and set up the infrastructure for Vector Search and RAG (Retrieval Augmented Generation) workflows
  • Elevate the technical bar of the team, mentoring Staff and Senior engineers on design patterns, code quality, and architectural decisions
  • Translate complex requirements from ML Engineers and Data Scientists into robust engineering tickets and infrastructure roadmaps
What we offer
What we offer
  • An attractive Base Salary
  • Participation in our Short Term Incentive plan (annual bonus)
  • Work From Anywhere: Enjoy up to 20 days a year of working from anywhere
  • A 24/7 Employee Assistance Program for you and your family
  • A collaborative environment with an opportunity to explore your potential and grow
  • Fulltime
Read More
Arrow Right

Senior Engineer / Lead Engineer - Virtual Engineering - AI ML

Sponsorship:  GM DOES NOT PROVIDE IMMIGRATION-RELATED SPONSORSHIP FOR THIS ROLE....
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Masters Degree Mechanical/Automobile/Production /Mechatronics Engineering discipline or similar
  • 5+ years in Automotive Manufacturing / Manufacturing Engineering Experience
  • 1+ year experience in implementing AI/ML solutions in Automotive use cases
  • Should have executed at least 2 end-to-end projects in the text or Image data domain (from problem definition to deployment)
  • Strong programming skills in Python
  • Proficiency with ML/DL frameworks like Scikit-learn, TensorFlow, PyTorch, XGBoost
  • Solid understanding of statistics, probability, and linear algebra
  • Experience in data preprocessing, feature engineering, ETL and Exploratory Data Analysis (EDA)
  • Experience with MLOps platforms (MLflow, Kubeflow, Vertex AI, Azure ML)
  • Knowledge of ML model evaluation
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business problems in the in the Manufacturing Engineering and Operations space and solve them using ML methodologies
  • Design, develop, and fine-tune AI/ML models for classification, regression, clustering, and recommendation systems
  • Work with MLOps tools to automate workflows, CI/CD pipelines, and model monitoring
  • Evaluate, validate, and benchmark model performance using appropriate metrics
  • Deploy AI models into production environments in collaboration with IT/AI teams
  • Establish monitoring and maintenance processes to ensure model accuracy over time
  • Ensure that all AI solutions comply with organizational data security, confidentiality, and regulatory requirements
  • Document workflows, results, and lessons learned for organizational knowledge sharing
  • Stay updated on advancements in ML model evaluation, ML frameworks, end-to-end ML pipelines
  • Fulltime
Read More
Arrow Right
New

Senior Databricks ML & AI Engineer

Location
Location
Salary
Salary:
Not provided
myticas.com Logo
Myticas Consulting
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on Machine Learning Engineering, MLOps, or AI Engineering experience within enterprise production environments
  • Extensive experience designing and deploying Databricks Lakehouse solutions using Delta Lake, Unity Catalog, MLflow, Databricks SQL, Workflows, and Delta Live Tables
  • Strong programming expertise in Python, PySpark, Pandas, NumPy, and modern software engineering best practices
  • Experience building, training, deploying, and monitoring machine learning models using PyTorch, TensorFlow, scikit-learn, XGBoost, or similar ML frameworks
  • Proven experience implementing end-to-end MLOps pipelines including experiment tracking, model registry, automated retraining, model deployment, and production monitoring
  • Hands-on experience developing Generative AI (GenAI) and Large Language Model (LLM) solutions, including RAG architectures, prompt engineering, LangChain, LlamaIndex, or Databricks Mosaic AI
  • Experience implementing Vector Search, embeddings, semantic search, and AI retrieval pipelines using Databricks or similar vector database technologies
  • Strong understanding of Apache Spark internals, distributed computing, performance tuning, partitioning, memory management, and large-scale data processing
  • Experience with cloud platforms including AWS, Azure, or Google Cloud Platform, supporting enterprise AI and machine learning workloads
  • Experience implementing CI/CD pipelines, GitHub Actions, Azure DevOps, Databricks Asset Bundles, and modern DevOps automation practices
Read More
Arrow Right

Ai / Ml Engineer

Nextera Robotics is a fast-growing Physical AI company founded at MIT. We build ...
Location
Location
United States , Boston
Salary
Salary:
130000.00 - 180000.00 USD / Year
helpcare.ai Logo
Helpcare AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven academic or industry experience building computer vision systems, including areas such as object detection, segmentation, visual understanding, or multimodal models
  • Strong Python skills, with the ability to build, test, and iterate quickly in a production-oriented environment
  • Proficiency using modern AI-assisted development tools such as coding agents, GitHub Copilot, or similar systems to accelerate implementation, debugging, and iteration
  • Solid problem-solving skills, strong ownership, and the ability to communicate clearly across technical and non-technical teams
  • Familiarity with data pipelines, model evaluation, and real-world ML workflows
Job Responsibility
Job Responsibility
  • Build and productionize advanced computer vision and multimodal AI systems for real-world industrial deployment - spanning detection, segmentation, visual understanding, and VLM/LLM-powered reasoning
  • Turn large-scale visual data collected by autonomous robots into actionable insights for progress tracking, safety monitoring, and project intelligence
  • Develop AI systems deployed across robotics, construction, energy, and telecommunications
  • Work closely with engineers and annotators to expand and improve high-quality training datasets
  • Support and scale machine learning operations (MLOps)
  • Build and maintain the cloud infrastructure and ML pipelines that power training, evaluation, and deployment
What we offer
What we offer
  • Competitive salary based on experience, plus great perks
  • Join a YC-backed startup shaping the future of robotics
  • Fulltime
Read More
Arrow Right

AI Data Engineer

We are looking for a technically sharp and detail-oriented Data Engineer to join...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Information Systems, Data Engineering, Mathematics, or a related discipline
  • 4 – 5 years of hands-on experience in data engineering, ETL development, or analytics engineering roles
  • Demonstrable experience with Databricks and/or Microsoft Fabric in a production environment
  • Proficiency in Power BI report and semantic model development
  • Exposure to Collibra or equivalent data governance / cataloguing platforms is strongly preferred
  • Strong SQL and Python skills
  • PySpark experience is required
  • Familiarity with Azure cloud services and DevOps practices for data pipeline deployment
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL/ELT pipelines using Azure Data Factory, Databricks (PySpark / Delta Live Tables), and Microsoft Fabric Data Factory
  • Transform raw, multi-source data into clean, conformed, and analytics-ready datasets following Medallion Architecture principles (Bronze → Silver → Gold)
  • Develop and optimize SQL and PySpark-based transformation logic for structured, semi-structured, and unstructured data
  • Implement incremental load patterns, merge/upsert logic, and slowly changing dimension (SCD) strategies to support historical data tracking
  • Collaborate with the AI Engineers to prepare high-quality feature datasets for ML and LLM use cases
  • Define, implement, and monitor data quality rules including completeness, accuracy, consistency, timeliness, and uniqueness checks
  • Administer and extend the Collibra data governance platform — including business glossary management, data lineage documentation, and stewardship workflows
  • Build automated data quality validation frameworks using tools such as Great Expectations, dbt tests, or Unity Catalog data quality constraints in Databricks
  • Triage and resolve data quality incidents, root-cause data anomalies, and communicate impact to stakeholders proactively
  • Maintain metadata catalogues and ensure all critical datasets have documented ownership, lineage, and classification
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior AI & Data Engineer

The Senior AI & Data Engineer is an individual contributor role that acts as the...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Data Science, AI/ML, Engineering, Mathematics, or a related technical discipline
  • PhD is a plus
  • 7 – 10 years of hands-on experience in AI/ML engineering, applied data science, or LLM engineering roles
  • Proven track record of delivering production AI systems
  • Deep expertise with at least two major LLM platforms (Claude, GPT, Gemini, or equivalent)
  • Significant experience with Collibra or an equivalent enterprise data governance platform
  • Demonstrated experience leading cross-functional AI initiatives and mentoring junior engineers
  • Strong ML fundamentals alongside modern generative AI skills
  • Experience with responsible AI practices, including fairness auditing, explainability, and content safety, is strongly preferred
Job Responsibility
Job Responsibility
  • Serve as the dual AI & data SME for the team and organization
  • Define and uphold engineering standards, design patterns, and best practices across both AI and data engineering disciplines
  • Lead technical discovery for new AI and data use cases
  • Participate in and lead cross-functional initiatives where AI and data strategy intersect
  • Mentor and upskill the Applied AI Engineer and AI Data Engineer
  • Architect and deliver complex agentic AI systems
  • Design and implement advanced RAG architectures
  • Lead LLM evaluation frameworks
  • Assess and implement LLM fine-tuning and alignment strategies
  • Own LLM integration architecture
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Read More
Arrow Right

Senior ML Engineer - AI Platform & Agents

We are building agentic AI into the core of our product and need someone who can...
Location
Location
France , Bordeaux
Salary
Salary:
Not provided
phantombuster.com Logo
PhantomBuster
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as an ML Engineer, AI Engineer, or Software Engineer with a strong AI focus
  • Hands-on experience building AI agents using frameworks such as LangChain, Amazon Bedrock AgentCore, or similar
  • Strong understanding of LLM-based systems: prompt engineering, agent orchestration, tool use, and multi-agent workflows
  • Familiarity with MCP (Model Context Protocol) and experience integrating agents with external APIs or data sources
  • Experience working with Agents for Amazon Bedrock AgentCore or similar agent setups
  • Strong understanding of machine learning algorithms, statistical methods, and data preprocessing techniques
  • Experience with cloud platforms for model training and deployment, especially AWS
  • Proficiency in Python, including LangChain, and standard data libraries (Pandas, NumPy, etc.)
  • Fluency in English
Job Responsibility
Job Responsibility
  • Define and evolve our infrastructure to allow for better ML and AI capabilities, with a focus on LLM-based and agentic systems
  • Contribute to the development and expansion of our agentic AI framework powered by AWS Bedrock, enabling both internal tools and customer-facing features
  • Identify, source, and refine datasets to allow tuning models, powering retrieval pipelines, or expanding agentic workflows
  • Pre-process data by using techniques such as data cleaning, feature engineering, and transformation
  • Train, evaluate, and deploy both LLM-based systems and traditional machine learning models into production
  • Monitor, debug, and continuously improve deployed models and AI tools
  • Support machine learning usage throughout the company, including selecting the right modeling approach for the use case (LLM vs. traditional ML)
  • Support the integration and use of LLMs, including approaches such as fine-tuning, prompt tuning, and retrieval-augmented generation (RAG), to improve accuracy
What we offer
What we offer
  • International team
  • Fun team building events
  • €40/month for remote work
  • Flexible working time
  • Home office budget up to €1500
  • 100% of an Alan Blue subscription
  • Lunch vouchers - €8 (50% The Phantom Company) / worked day
  • Partnership with MokaCare
  • €70 a month benefit for entertainment expenses
  • Book Allowance and Sharing Program
Read More
Arrow Right