CrawlJobs Logo
Briefcase Icon
Category Icon

Filters

×
Work Mode

Multimodal AI Engineer, Document Understanding Jobs (Hybrid work)

7 Job Offers

Filters
New
AI Content Engineer
Save Icon
Location Icon
Location
United States , San Francisco
Salary Icon
Salary
Not provided
llamaindex.ai Logo
LlamaIndex
Expiration Date
Until further notice
Read More
Arrow Right
Senior Data Engineer - AI Focused
Save Icon
Location Icon
Location
France , Paris
Salary Icon
Salary
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Read More
Arrow Right
Senior AI Engineer (ML/DL)
Save Icon
Location Icon
Location
United States , Belmont
Salary Icon
Salary
170000.00 - 210000.00 USD / Year
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Read More
Arrow Right
Senior Computer Vision and Machine Learning Research Scientist
Save Icon
Location Icon
Location
United States , Seattle
Salary Icon
Salary
159750.00 - 234300.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Read More
Arrow Right
Senior Computer Vision and Machine Learning Research Scientist
Save Icon
Location Icon
Location
United States , Seattle
Salary Icon
Salary
159750.00 - 234300.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Read More
Arrow Right
Multimodal AI Engineer, Document Understanding
Save Icon
Location Icon
Location
United States , San Francisco
Salary Icon
Salary
Not provided
llamaindex.ai Logo
LlamaIndex
Expiration Date
Until further notice
Read More
Arrow Right
Senior AI Engineer
Save Icon
Location Icon
Location
United States , Belmont
Salary Icon
Salary
170000.00 - 210000.00 USD / Year
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Read More
Arrow Right
A Multimodal AI Engineer specializing in Document Understanding is a highly specialized professional at the forefront of artificial intelligence, building intelligent systems that enable machines to read, interpret, and extract meaning from documents just as a human would. This role sits at the dynamic intersection of Computer Vision (CV) and Natural Language Processing (NLP), requiring expertise in both visual and textual data analysis. Professionals in this field are in high demand, with numerous jobs available for those who can bridge the gap between cutting-edge research and scalable, real-world applications. Their core mission is to transform unstructured document data—such as PDFs, scanned images, presentations, and forms—into structured, actionable information, thereby automating complex workflows across industries like finance, legal, healthcare, and logistics. Typical responsibilities for a Multimodal AI Engineer in Document Understanding are multifaceted. They commonly involve designing, training, and optimizing machine learning models for tasks like layout analysis, optical character recognition (OCR) enhancement, table extraction, form field identification, and semantic understanding of document content. A significant part of the role is building robust data pipelines to curate and preprocess diverse document datasets, as well as developing rigorous evaluation frameworks to measure model performance accurately. These engineers also focus on deploying these models into production ML systems, ensuring they are reliable, efficient, and scalable to process millions of documents. Collaboration is key, as they frequently work with product and software engineering teams to integrate AI capabilities into user-facing applications and APIs. The typical skill set and requirements for these jobs are demanding and interdisciplinary. A strong foundation in software engineering, particularly in Python, is essential, coupled with deep knowledge of modern ML frameworks like PyTorch or TensorFlow. Candidates are expected to have hands-on experience with training and fine-tuning models, especially Vision-Language Models (VLMs), transformers, and architectures tailored for multimodal tasks. Proficiency in computer vision techniques for image segmentation and object detection, alongside NLP for entity recognition and language modeling, is crucial. Familiarity with ML operations (MLOps) tools for model deployment, monitoring, and lifecycle management is increasingly important. Furthermore, successful professionals in this role possess the ability to rapidly prototype ideas, rigorously experiment, and translate findings from academic papers into practical solutions. A problem-solving mindset, strong mathematical background, and excellent communication skills to articulate complex technical concepts round out the profile for these highly sought-after jobs, which are pivotal in unlocking the value trapped within the world's vast repositories of documents.

Filters

×
Countries
Category
Location
Work Mode
Salary