CrawlJobs Logo

Vision-Language Model (VLM) Engineer

Turkey, Istanbul · Job Posted April 22, 2026
Apply Position
Job Link Share

Job Description

Join as Vision-Language Model Engineer to shape multimodal AI — design, train, and deploy models for image, text, and beyond. Collaborative, impact-driven, apply today. We are seeking a highly skilled Vision-Language Model (VLM) Engineer to design, develop, and deploy state-of-the-art multimodal AI systems. You will work at the intersection of computer vision and natural language processing, contributing to cutting-edge products that combine image and text understanding.

Job Responsibility

  • Design and implement vision-language models for tasks such as image captioning, visual question answering, and cross-modal retrieval
  • Train, fine-tune, and evaluate multimodal models using large-scale datasets
  • Optimize model performance for scalability and real-world deployment
  • Collaborate with cross-functional teams including data scientists, software engineers, and product managers
  • Stay up to date with the latest research in multimodal AI and apply it to production systems

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, or a related field
  • Strong experience with Python and deep learning frameworks (e.g., PyTorch or TensorFlow)
  • Solid understanding of machine learning, computer vision, and NLP concepts
  • Experience with multimodal models or related architectures (e.g., transformers)
  • Familiarity with handling large datasets and distributed training

Nice to have

  • Experience with models such as CLIP, BLIP, or similar multimodal architectures
  • Knowledge of model deployment (Docker, APIs, cloud services)
  • Publications or contributions to AI research projects
  • Experience working with real-world AI applications

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Vision-Language Model (VLM) Engineer

8 matching positions

Staff Research Scientist - VLM / VLA

At General Motors, our product teams are redefining mobility. Through a human-ce...
Location
Location
United States , Mountain View
Salary
Salary:
218800.00 - 335300.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. in Machine Learning, Robotics, Computer Science, Electrical Engineering, or a related technical field
  • 5+ years of experience in AI/ML research and applied development
  • Deep expertise in modern ML architectures (transformers, generative AI, multimodal systems)
  • Strong programming skills in Python
  • Excellent communication, collaboration, and mentoring abilities, comfortable influencing technical strategy and guiding ML excellence across the organization
Job Responsibility
Job Responsibility
  • Research, design, and prototype advanced Vision-Language Models and Vision-Language-Action foundational models tailored for real-time semantic understanding and behavioral prediction in autonomous driving
  • Drive the technical strategy for onboard model optimization, leading initiatives in model quantization, pruning, knowledge distillation, and compilation to ensure high-parameter models execute with ultra-low latency on vehicle edge hardware
  • Advance multimodal alignment techniques, ensuring seamless integration of camera, radar, LiDAR, and textual/logical prompts into unified foundational architectures
  • Influence technical roadmaps and shape strategic machine learning priorities that align with safety requirements, core product milestones, and next-generation vehicle launches
  • Provide technical mentorship and long-term vision to a multidisciplinary group of machine learning engineers, software developers, and hardware specialists
  • Foster internal innovation by collaborating closely with perception, planning, and infrastructure teams to integrate foundational models into the core autonomous software stack
  • Represent the company externally to the global scientific community by publishing original research, securing patents, and presenting at top-tier artificial intelligence and robotics conferences
What we offer
What we offer
  • Medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Applied Researcher

Aurora’s mission is to deliver the benefits of self-driving technology safely, q...
Location
Location
United States , Mountain View; Pittsburgh; San Francisco; Seattle
Salary
Salary:
Not provided
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD Graduate in AI, Computer Science, or Robotics (Top-tier research lab background)
  • Deep expertise in Vision-Language Models (VLM) and Vision-Language-Action (VLA) models
  • Hands-on experience with 'jagged edges'
  • Practical researcher: Ability to translate complex papers into production-ready code
  • Collaborative Guide: Willingness to mentor existing team members and bridge the knowledge gap
  • Excitement for large-scale infrastructure and supercomputing environments (e.g., Aurora)
  • Architectural Creativity: Ability to rethink and redo entire model architectures rather than just fine-tuning
  • Strong programming skills in Python
  • Preference for a strong depth programming in modern C++
Job Responsibility
Job Responsibility
  • Architectural Exploration: Deep dive into VLM/VLA architectures to determine their viability for our specific problem sets
  • Technical Mentorship: Act as the subject matter expert and guide for an exceptionally capable engineering team eager to move into this space
  • Practical Implementation: Bridge the gap between high-level academic theory and practical, scalable applications
  • Innovation: Redesign and optimize model architectures to leverage high-performance computing environments like Aurora
Read More
Arrow Right

Applied Researcher

Location
Location
United States , Seattle
Salary
Salary:
Not provided
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD Graduate in AI, Computer Science, or Robotics (Top-tier research lab background)
  • Deep expertise in Vision-Language Models (VLM) and Vision-Language-Action (VLA) models
  • Hands-on experience with 'jagged edges'—knowing where current architectures fail and how to troubleshoot them
  • Practical researcher: Ability to translate complex papers into production-ready code
  • Collaborative Guide: Willingness to mentor existing team members and bridge the knowledge gap
  • Excitement for large-scale infrastructure and supercomputing environments (e.g., Aurora)
  • Architectural Creativity: Ability to rethink and redo entire model architectures rather than just fine-tuning
  • Strong programming skills in Python
  • Preference for a strong depth programming in modern C++
Job Responsibility
Job Responsibility
  • Architectural Exploration: Deep dive into VLM/VLA architectures to determine their viability for our specific problem sets
  • Technical Mentorship: Act as the subject matter expert and guide for an exceptionally capable engineering team eager to move into this space
  • Practical Implementation: Bridge the gap between high-level academic theory and practical, scalable applications
  • Innovation: Redesign and optimize model architectures to leverage high-performance computing environments like Aurora
  • Fulltime
Read More
Arrow Right

Applied Researcher

Who we are: Aurora’s mission is to deliver the benefits of self-driving technolo...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD Graduate in AI, Computer Science, or Robotics (Top-tier research lab background)
  • Deep expertise in Vision-Language Models (VLM) and Vision-Language-Action (VLA) models
  • Hands-on experience with 'jagged edges'—knowing where current architectures fail and how to troubleshoot them
  • Practical researcher: Ability to translate complex papers into production-ready code
  • Collaborative Guide: Willingness to mentor existing team members and bridge the knowledge gap
  • Excitement for large-scale infrastructure and supercomputing environments (e.g., Aurora)
  • Architectural Creativity: Ability to rethink and redo entire model architectures rather than just fine-tuning
  • Strong programming skills in Python
  • Preference for a strong depth programming in modern C++
Job Responsibility
Job Responsibility
  • Architectural Exploration: Deep dive into VLM/VLA architectures to determine their viability for our specific problem sets
  • Technical Mentorship: Act as the subject matter expert and guide for an exceptionally capable engineering team eager to move into this space
  • Practical Implementation: Bridge the gap between high-level academic theory and practical, scalable applications
  • Innovation: Redesign and optimize model architectures to leverage high-performance computing environments like Aurora
Read More
Arrow Right

Applied Researcher

Aurora’s mission is to deliver the benefits of self-driving technology safely, q...
Location
Location
United States , Pittsburgh
Salary
Salary:
Not provided
aurora.tech Logo
Aurora Innovation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD Graduate in AI, Computer Science, or Robotics (Top-tier research lab background)
  • Deep expertise in Vision-Language Models (VLM) and Vision-Language-Action (VLA) models
  • Hands-on experience with 'jagged edges'—knowing where current architectures fail and how to troubleshoot them
  • Practical researcher: Ability to translate complex papers into production-ready code
  • Collaborative Guide: Willingness to mentor existing team members and bridge the knowledge gap
  • Excitement for large-scale infrastructure and supercomputing environments (e.g., Aurora)
  • Architectural Creativity: Ability to rethink and redo entire model architectures rather than just fine-tuning
  • Strong programming skills in Python
  • Preference for a strong depth programming in modern C++
Job Responsibility
Job Responsibility
  • Architectural Exploration: Deep dive into VLM/VLA architectures to determine their viability for our specific problem sets
  • Technical Mentorship: Act as the subject matter expert and guide for an exceptionally capable engineering team eager to move into this space
  • Practical Implementation: Bridge the gap between high-level academic theory and practical, scalable applications
  • Innovation: Redesign and optimize model architectures to leverage high-performance computing environments like Aurora
Read More
Arrow Right

Machine Learning Scientist II - Gen AI

We are seeking a highly motivated and experienced Machine Learning Scientist to ...
Location
Location
United States , Boston
Salary
Salary:
127300.00 - 186700.00 USD / Year
simplisafe.com Logo
SimpliSafe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MS or PhD in Computer Science, Artificial Intelligence, or a related field
  • Experience training or fine-tuning large language models (LLMs) using modern frameworks
  • Strong grasp of deep learning, particularly transformer architectures and foundational model training techniques for text and vision modalities
  • Proficient in Python and relevant ML libraries (e.g., PyTorch, TensorFlow, HuggingFace Transformers)
  • Hands-on experience in developing and deploying LLM- or VLM-powered applications
  • Familiarity with prompt engineering, retrieval-augmented generation (RAG), MCP (Model Context Protocol, Agentic AI and evaluation of generative models
  • Understanding of MLOps practices and how to scale experiments into production-grade solutions
  • Strong communication and documentation skills
  • Collaborative mindset with the ability to thrive in a fast-paced, interdisciplinary environment
Job Responsibility
Job Responsibility
  • Develop and fine-tune large language models (LLMs) and vision-language models (VLMs) to address real-world challenges in the home security space
  • Work with key stakeholders to identify key research initiatives that can have impact on business outcomes
  • Take research initiatives from idea generation to production
  • Collaborate with engineers and product managers to integrate capabilities into our existing systems
  • Stay up-to-date on the latest advancements in LLMs, VLMs, and multimodal systems. Evaluate new techniques for potential adoption and improvement of internal capabilities
What we offer
What we offer
  • A mission- and values-driven culture and a safe, inclusive environment where you can build, grow and thrive
  • A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
  • Free SimpliSafe system and professional monitoring for your home
  • Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change
  • Participation in our annual bonus program, equity, and other forms of compensation, in addition to a full range of medical, retirement, and lifestyle benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

We are seeking a Senior Machine Learning Engineer to bridge the gap between adva...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Artificial Intelligence, or High-Performance Computing
  • Minimum 4+ years of experience in Machine Learning, with a mandatory split focus between Model Architecture and Systems Optimization
  • Proven experience building and shipping Vision-Language Models (e.g., architectures similar to CLIP, Flamingo, Pix2Struct)
  • Must have experience creating custom evaluation sets for tasks like Document Understanding
  • Expert-level knowledge of SGLang and vLLM for optimized serving
  • Demonstrable experience optimizing models for both NVIDIA (H100) and AMD (MI300x) accelerators
  • Hands-on experience with Knowledge Distillation and Pruning to reduce model latency for target serving sizes
  • A track record of taking complex multi-modal models from research code to a deployed, user-facing production product
Job Responsibility
Job Responsibility
  • Continuously evaluate and implement the latest research trends in Vision-Language Models, specifically focusing on Referring Expression Comprehension (REC), Document Understanding (Pix2Struct), and Visual Question Answering (VQA)
  • Design and build massive-scale training and evaluation datasets, ensuring multilingual compatibility and broad visual understanding for European market requirements
  • Lead the model co-design process, creating architectures that are natively optimized for accelerator capabilities (compute-bound vs. memory-bound operations)
  • Architect high-throughput serving layers using SGLang and vLLM, optimizing for non-standard decoding strategies
  • Implement scientific experiments to find the Pareto-optimal frontier between serving latency and generation quality
  • Execute Knowledge Distillation (KD), unstructured pruning, and quantization techniques to fit large-scale VLM architectures onto single-node GPU setups (specifically H100 or MI300x) without compromising model quality
  • Write and optimize custom kernels (CUDA/HIP) to accelerate serving latency, identifying bottlenecks at the operator level
  • Manage the full pre-training and post-training tech stack, ensuring seamless integration between model weights and inference engines
  • Take ownership of landing the serving-efficient model in a production environment, ensuring reliability and scalability
  • Fulltime
Read More
Arrow Right

Robotics Engineer

sensmore automates the world's largest machines with unprecedented intelligence....
Location
Location
Germany , Berlin / Potsdam
Salary
Salary:
Not provided
sensmore.ai Logo
Sensmore GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Current Enrollment: Strong academic standing in a Master’s program (Robotics, Computer Science, Electrical Engineering, Automation, or a related field)
  • Technical Proficiency: Strong hands-on experience with Python and PyTorch
  • Ownership & Proactivity: Ability to take ownership of your tasks, ability to drive sub-projects from concept to execution with a "getting things done" mentality
  • GenAI Foundations: A deep interest in Vision-Language Models (VLM) and their application in Robotics and Automation
  • Project Experience: Initial successful projects or coursework in GenAI, Robotics, or Computer Vision
Job Responsibility
Job Responsibility
  • Data Analysis & Engineering: Analyze and preprocess large-scale, multi-modal datasets including video, radar, and lidar streams
  • Dataset Generation: Work on self-supervised dataset generation pipelines to train next-generation VLA models
  • Model Development: Utilize state-of-the-art GenAI tools and frameworks (e.g., HuggingFace, Gemini, Unsloth) to build and refine models
  • Prompt & Model Optimization: Apply prompt optimization techniques and fine-tuning to improve model reasoning and action generation
  • Training & Evaluation: Execute training runs, monitor performance, and rigorously evaluate results using new industrial data
  • Deployment: Assist in optimizing model deployment to ensure reliability and real-time performance in production-grade heavy industry settings
What we offer
What we offer
  • Attractive compensation package and stock options
  • Beverages on-site and regular social events
  • Engage with top-tier researchers, engineers, and thought leaders
  • Influence the future of robotic technologies and tackle significant technological challenges
  • Assistance with relocation to Berlin
  • Fulltime
Read More
Arrow Right