CrawlJobs Logo

AI/ML & Data Engineer

India, Chennai · Job Posted January 01, 2026
Apply Position
Job Link Share

Job Responsibility

  • Data Ingestion and Preprocessing: Ability to build and maintain data pipelines to ingest unstructured data from PDFs, gazettes, HTML circulars etc. and process data extraction, parsing, and normalization
  • NLP & LLM Modeling: Ability to fine-tune or prompt-tune LLMs for summarization, classification, and change detection in regulations. Ability to develop embeddings for semantic similarity.
  • Knowledge Graph Engineering: Ability to design entity relationships (regulation, control, policy) and implement retrieval over Neo4j or similar graph DBs.
  • Information Retrieval (RAG): Ability to build RAG pipelines for natural language querying of regulations.
  • Annotation and Validation: Ability to annotate training data by collaborating with SMEs and validate model outputs
  • MLOps: Ability to build CI/CD for model retraining, versioning, and evaluation (precision, recall, BLEU, etc.)
  • API and Integration: Ability to expose ML models as REST APIs (FastAPI) for integration with product frontend.

Requirements

  • 4~6 years in ML/NLP, preferably in document-heavy domains (finance, legal, policy)
  • Languages: Python, SQL
  • AI/ML/NLP: Hugging face transformers, OpenAI API, Spacy, Scikit-Learn, LangChain, RAG, LLM prompt-tuning, LLM fine-tuning
  • Vector Search: Pinecone, Weaviate, FAISS
  • Data Engineering: Airflow, Kafka, OCR (Tesseract, pdfminer)
  • MLOps: MLflow, Docker

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI/ML & Data Engineer

8 matching positions

Data Engineer - AI/ML

Data sits at the heart of the company. This role ensures that Awin can fully lev...
Location
Location
Romania , Iași
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or higher in Data Science, Data Engineering or a related field, preferably with a strong focus on mathematics, statistics, or data engineering
  • 2+ years experience as data engineer on AI/ML project with Python
  • Strong experience using Databricks, including Jobs, Asset Bundles, Delta Lake, and MLflow, as well as Azure data engineering tools such as Azure Data Factory and Azure Data Lake Storage (ADLS)
  • Solid understanding of Scrum practices and a strong Agile mindset
  • Advanced proficiency in Python and its key data and ML libraries (NumPy, PySpark, Scikit learn, TensorFlow/PyTorch)
  • Working knowledge of generative AI models (e.g., ChatGPT, Claude), Databricks GenAI tooling (including embedding models), and modern data structures such as vector databases
  • Strong expertise in designing end to end ETL solutions using Databricks, including identifying, extracting, and curating datasets for machine learning model development
  • Strong knowledge of cloud platforms (Azure and AWS) and relevant services
  • Hands on experience with big data technologies, including Databricks and Spark
  • Very strong analytical skills to translate business need into actions with proactive approach to task and challenges, delivering project on time and budget
Job Responsibility
Job Responsibility
  • Write clean, elegant and maintainable code with Data engineering and AI/ML best practices
  • Understanding business objectives and developing models that help to achieve them, along with the creation and monitoring of business relevant metrics
  • Find new ways solve complex business problems with self improving automated predictive models
  • Evaluate existing models and recommend improvements for better result and performance efficiency
  • Develop best practices for building and orchestrating predictive models
  • Be responsible for quality, accuracy and interpretation of the result sets
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: a flexible four-day Flexi-Week at full pay and with no reduction to annual holiday allowance
  • Remote Working Allowance: a monthly allowance to cover part of your running costs
  • Flexi-Office: international culture and flexibility through Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Meal Vouchers: a certain net sum to spend on a variety of lunches
  • Health & Wellbeing: insurance covering several types of health, vision and/or dental treatments for you and for up to one additional family member
  • Remote Working Furniture Package: after 3 months of employment, eligible for a furniture package to set up a proper workplace at your remote working location
  • Appreciation: thank and reward colleagues by sending them a voucher through our peer-to-peer program
  • Fulltime
Read More
Arrow Right

Lead Data Engineer - AI/ML

The Lead Data Engineer will be part of a team building Stanford Health Care's (S...
Location
Location
United States of America , Palo Alto
Salary
Salary:
94.35 - 125.03 USD / Hour
stanfordhealthcare.org Logo
Stanford Health Care
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related, or equivalent working experience
  • 5+ years experience in building data infrastructure for analytics teams, including ability to write code in SQL, R, or Python for processing large datasets in distributed cloud environments
  • Experience with cloud deployment strategies and CI/CD
  • Experience building and working with data infrastructure in a SaaS environment
  • Experience overseeing, developing or implementing machine learning operations (MLOps) processes
  • Experience mentoring junior engineers and enforcing best practices around code quality
  • Knowledge of multiple programming languages, commitment to choosing languages based on project-specific requirements, and willingness to learn new programming languages as necessary
  • Knowledge of resource management and automation approaches such as workflow runners
  • Collaborative mentality and excitement for iterative design working closely with the Data Science team.
Job Responsibility
Job Responsibility
  • Build end-to-end data pipelines and infrastructure for ML models used by the Data Science team and others at SHC
  • Understand the requirements of data processing and analysis pipelines and make appropriate technical design and interface decisions
  • Understand data flows among the SHC applications and use this knowledge to make recommendations and design decisions for languages, tools, and platforms used in software and data projects
  • Troubleshoot and debug environment and infrastructure problems found in production and non-production environments for projects by the Data Science Team
  • Work with other groups at SHC and the Technology and Digital Solutions (TDS) group to ensure servers and system maintenance based on updates, system requirements, data usage, and security requirements.
  • Fulltime
Read More
Arrow Right

Senior Data & AI/ML Engineer - GCP Specialization Lead

We are on a bold mission to create the best software services offering in the wo...
Location
Location
United States , Menlo Park
Salary
Salary:
Not provided
techjays.com Logo
techjays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • GCP Services: BigQuery, Dataflow, Pub/Sub, Vertex AI
  • ML Engineering: End-to-end ML pipelines using Vertex AI / Kubeflow
  • Programming: Python & SQL
  • MLOps: CI/CD for ML, Model deployment & monitoring
  • Infrastructure-as-Code: Terraform
  • Data Engineering: ETL/ELT, real-time & batch pipelines
  • AI/ML Tools: TensorFlow, scikit-learn, XGBoost
  • Min Experience: 10+ Years
Job Responsibility
Job Responsibility
  • Design and implement data architectures for real-time and batch pipelines, leveraging GCP services such as BigQuery, Dataflow, Dataproc, Pub/Sub, Vertex AI, and Cloud Storage
  • Lead the development of ML pipelines, from feature engineering to model training and deployment using Vertex AI, AI Platform, and Kubeflow Pipelines
  • Collaborate with data scientists to operationalize ML models and support MLOps practices using Cloud Functions, CI/CD, and Model Registry
  • Define and implement data governance, lineage, monitoring, and quality frameworks
  • Build and document GCP-native solutions and architectures that can be used for case studies and specialization submissions
  • Lead client-facing PoCs or MVPs to showcase AI/ML capabilities using GCP
  • Contribute to building repeatable solution accelerators in Data & AI/ML
  • Work with the leadership team to align with Google Cloud Partner Program metrics
  • Mentor engineers and data scientists toward achieving GCP certifications, especially in Data Engineering and Machine Learning
  • Organize and lead internal GCP AI/ML enablement sessions
What we offer
What we offer
  • Best in class packages
  • Paid holidays and flexible paid time away
  • Casual dress code & flexible working environment
  • Medical Insurance covering self & family up to 4 lakhs per person
Read More
Arrow Right

Software Engineer - Data Scientist AI/ML

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Computer Science or Data Science, Electrical Engineering, Statistics, Applied Math or equivalent fields with strong mathematical background
  • General understanding of machine learning techniques and algorithms, including clustering, anomaly detection, optimization, Neural network, Graph ML, etc
  • Experience building data science-driven solutions including data collection, feature selection, model training, post-deployment validation
  • Strong hands-on coding skills (preferably in Python) processing large-scale data set and developing machine learning models
  • Familiar with one or more machine learning or statistical modeling tools such as Numpy, ScikitLearn, MLlib, Tensorflow
  • Works well in a team setting and is self-driven
Job Responsibility
Job Responsibility
  • Collaborate with team to understand feature, work with domain experts to identify relevant “signals” during feature engineering, deliver generic and performant ML solutions
  • Keep up to date with newest technology trends
  • Communicate results and ideas to key decision makers
  • Implement new statistical or other mathematical methodologies as needed for specific models or analysis
  • Optimize joint development efforts through appropriate database use and project design
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Data Engineer Lead (OT Data)

Data Engineer (OT Data) (Category - Engineer) Sector: Oil and Gas Location: Doha...
Location
Location
Qatar , Doha
Salary
Salary:
Not provided
Codvo AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's in engineering, Information Systems, or a related quantitative field
  • 5+ years of proven experience in a data engineering role
  • Experience within oil and gas industry is highly preferred
  • Demonstrable experience building and operationalizing large-scale data pipelines and applications
Job Responsibility
Job Responsibility
  • Architect & Build Data Pipelines: Design, construct, install, test, and maintain highly scalable data management systems and ETL/ELT pipelines
  • Integrate Diverse Data Sources: Develop processes to ingest and integrate high-volume, high-velocity data from SCADA systems, historians (like OSIsoft PI, Aspen InfoPlus.21), DCS, PLC, and IoT sensors
  • Cloud Data Platform Development: Implement and manage data solutions on the Microsoft Azure cloud platform, Leveraging services like Azure IoT Hub, Azure Event Hubs, and Azure Stream Analytics for real-time ingestion and processing of operational technology (OT) data
  • Data Modelling & Warehousing: Design and implement data models optimized for time-series data from industrial assets, supporting operational dashboards and real-time analytics
  • Enable Advanced AI: Build the data infrastructure to support AI/ML models for predictive maintenance, operational anomaly detection, and process optimization using real-time OT data
  • Champion Master Data Management (MDM): Design and implement MDM strategies and solutions to create a single, authoritative source of truth for critical data domains such as wells, equipment, and assets, ensuring data consistency across the enterprise
  • Ensure Data Quality & Governance: Implement robust data quality checks, validation rules, and monitoring to ensure the accuracy, consistency, and reliability of our data. Adhere to and help shape our data governance policies
  • Embrace Industry Standards: Champion and implement industry-specific data standards and models, such as the OSDU™ Data Platform, to ensure interoperability and a unified data view across the upstream lifecycle
  • Collaborate & Innovate: Work closely with a cross-functional team of geoscientists, drilling engineers, data scientists, and business analysts to understand their data needs and deliver effective solutions
  • Automate & Optimize: Identify opportunities for process automation and infrastructure optimization to improve data delivery, scalability, and cost-effectiveness
  • Fulltime
Read More
Arrow Right

Solution Data Engineer – DG Infomatica Engineer

We are seeking a hands-on Informatica Developer with strong experience in data q...
Location
Location
United States of America , TEMPE
Salary
Salary:
Not provided
https://www.circlek.com Logo
Circle K
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5+ years of hands-on experience as an Informatica Developer or Data Quality Engineer
  • Strong experience with Informatica IDMC, including: Cloud Data Quality (CDQ)
  • Cloud Data Catalog (EDC/CDGC)
  • Experience implementing data quality validation rules, monitoring, and reporting
  • Hands-on experience with Snowflake and Databricks
  • Experience working in Azure cloud environments (AWS exposure is a plus)
  • Strong SQL skills and understanding of data warehousing and data modeling concepts
  • Experience working with on-prem and cloud data sources
  • Experience configuring Informatica Secure Agents and cloud connectivity
  • Familiarity with REST APIs and system integrations
Job Responsibility
Job Responsibility
  • Design, develop, and maintain data quality rules, scorecards, and dashboards using Informatica CDQ
  • Implement and support metadata ingestion, cataloging, and lineage using Informatica Cloud Data Catalog (EDC/CDGC)
  • Onboard application and data source metadata in collaboration with application owners and data engineering teams
  • Configure and maintain technical and business assets, custom attributes, and custom lineage within the Informatica catalog
  • Extract and publish metadata from cloud and on-prem data sources to support enterprise metadata management
  • Support data lineage and impact analysis across Snowflake, Databricks, and relational databases
  • Configure and manage Informatica Cloud environments, including Secure Agents and connectivity
  • Integrate Informatica with cloud platforms and external systems using REST APIs
  • Partner with data engineering and analytics teams to ensure high-quality, well-governed datasets are available for reporting and AI/ML use cases
Read More
Arrow Right

Data Engineer

Wells Fargo is seeking a Data Engineer to design, build and optimize scalable da...
Location
Location
India , BENGALURU
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of Data Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Strong experience in EDL pipelines using Spark & BigQuery on GCP
  • Hands-on expertise in GCP services (BigQuery, Dataflow, GCS, IAM) and cloud-native architecture
  • Solid understanding of enterprise data governance, data quality
  • Exposure to AI/ML or GenAI use cases, including integration with data platforms and awareness of AI governance standards
Job Responsibility
Job Responsibility
  • Participate in data warehouses across multiple databases and identify opportunity for developing table schemas within the Technology and Date area
  • Review and analyze basic data management business, operational, or technical assignments or challenges that require research, evaluation, and selection of alternatives and exercise independent judgment to guide medium risk deliverables
  • Present recommendations for resolving more complex data management situations and exercise independent judgment while developing expertise in the business data policies, procedures and compliance requirements
  • Collaborate and consult with technology colleagues, internal partners and stakeholders, including internal and external customers if applicable
  • Design and build scalable, production-grade data pipelines on GCP supporting analytics and AI workloads
  • Develop and manage secure, compliant cloud data platforms aligned with enterprise policies and controls
  • Optimize data processing for performance, cost efficiency, and reliability (Spark + BigQuery workloads)
  • Collaborate with AI/analytics teams to enable feature engineering, model consumption, and AI-driven insights
  • Fulltime
Read More
Arrow Right
New

Data Engineer – Lead

Data Engineer – Lead
Location
Location
India , Bengaluru Urban
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with Microsoft Fabric, including Lakehouse, Warehouse, OneLake, Pipelines, Dataflows Gen2, Notebooks, and Power BI integration
  • Expertise in ETL/ELT, data pipelines, distributed data processing, and cloud-scale data engineering
  • Strong SQL, Python, PySpark, and data modeling skills
  • Experience with Lakehouse, Warehouse, and Medallion Architecture
  • Understanding of Delta tables, dimensional modeling, star schema, facts, dimensions, and curated analytical datasets
  • Experience integrating structured, semi-structured, file-based, API-based, enterprise application, and cloud data sources
  • Experience with data quality, reconciliation, logging, monitoring, and error-handling frameworks
  • Experience leading technical teams and coordinating onshore/offshore delivery
  • Experience with Git, CI/CD, Azure DevOps, branching, code reviews, and release management
  • Good to Have: Experience with Azure Data Factory, Synapse, Databricks, ADLS Gen2, Azure SQL, Microsoft Purview, or related Azure services
Job Responsibility
Job Responsibility
  • Lead the design and implementation of scalable data pipelines and data processing frameworks in Microsoft Fabric
  • Define data engineering standards, development practices, naming conventions, coding guidelines, and reusable technical patterns
  • Lead implementation of Bronze, Silver, and Gold layers in the Medallion Architecture
  • Oversee ingestion, transformation, orchestration, validation, and publication of data from multiple enterprise, clinical, operational, and cloud-based sources
  • Guide development of Fabric Pipelines, Dataflows Gen2, Notebooks, Lakehouse tables, Warehouse objects, and curated datasets
  • Ensure scalability, performance, reliability, maintainability, security, monitoring, and optimization of data solutions
  • Define standards for data quality, reconciliation, logging, error handling, auditability, and lineage
  • Conduct technical design reviews, code reviews, performance reviews, and deployment readiness reviews
  • Mentor and guide data engineering teams across onshore/offshore locations
  • Collaborate with architects, platform engineers, BI teams, QA teams, AI/ML teams, functional consultants, and stakeholders
Read More
Arrow Right