CrawlJobs Logo
Briefcase Icon
Category Icon

Filters

×

Multimodal Speech Engineer Jobs

8 Job Offers

Filters
New
Principal Software Engineer, CoreAI
Save Icon
Location Icon
Location
United States , Redmond
Salary Icon
Salary
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Read More
Arrow Right
Senior Data Engineer - AI Focused
Save Icon
Location Icon
Location
France , Paris
Salary Icon
Salary
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Read More
Arrow Right
Research Intern - GenAI
Save Icon
Location Icon
Location
Australia , Chatswood, Sydney
Salary Icon
Salary
Not provided
appen.com Logo
Appen
Expiration Date
Until further notice
Read More
Arrow Right
Multimodal Speech Engineer, AI Companion
Save Icon
Location Icon
Location
United States , Palo Alto
Salary Icon
Salary
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Read More
Arrow Right
Senior Data Scientist
Save Icon
Location Icon
Location
Salary Icon
Salary
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Read More
Arrow Right
Senior Data Scientist
Save Icon
Location Icon
Location
Taiwan
Salary Icon
Salary
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Read More
Arrow Right
Full-Stack Engineer, AI Companion
Save Icon
Location Icon
Location
United States , Palo Alto
Salary Icon
Salary
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Read More
Arrow Right
Multimodal Speech Engineer
Save Icon
Location Icon
Location
United States , Palo Alto
Salary Icon
Salary
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Read More
Arrow Right
Explore the cutting-edge field of Multimodal Speech Engineering, a pivotal role at the intersection of artificial intelligence, human-computer interaction, and robotics. Multimodal Speech Engineer jobs are central to creating the next generation of intelligent systems that communicate naturally with humans. Professionals in this domain develop sophisticated AI that doesn't just process spoken words but understands and generates speech within a rich context of visual cues, environmental sounds, and physical expression. Their work is fundamental to building lifelike digital assistants, advanced robotics, immersive entertainment, and accessible technologies. Typically, a Multimodal Speech Engineer focuses on designing and implementing complex AI models that integrate multiple data streams. Common responsibilities include architecting and training neural networks that fuse audio (speech recognition and synthesis), visual data (from cameras or visual context), and sometimes other sensory inputs like spatial audio or motion data. They build systems where speech generation is dynamically influenced by what the AI "sees" and "hears" in its environment, enabling appropriate and context-aware responses. A key aspect of the role involves synchronizing generated speech with non-verbal elements, such as realistic lip movements on an avatar or expressive gestures in a robot, to create coherent and believable interactions. Engineers in this field also spend significant time constructing large-scale, multimodal data pipelines for training, continuously iterating on models to improve their naturalness, emotional resonance, and reliability. The typical skill set for these roles is highly interdisciplinary. A strong foundation in deep learning, with specific expertise in speech processing (ASR, TTS) and computer vision, is essential. Proficiency in frameworks like PyTorch or TensorFlow and experience with multimodal fusion techniques (e.g., cross-modal attention, transformer architectures) are standard requirements. Software engineering best practices are crucial for deploying real-time, low-latency systems. Furthermore, successful candidates often possess a creative problem-solving mindset, as they tackle open-ended challenges in making interactions feel intuitive and engaging. An understanding of conversational AI principles, linguistics, or human-robot interaction can be a significant advantage. The demand for Multimodal Speech Engineer jobs is rapidly growing within industries focused on AI companions, social robotics, automotive voice interfaces, virtual reality, and next-generation customer service platforms. It is a career for those passionate about dissolving the barrier between humans and machines, crafting interactions that are not just functional but truly natural and empathetic. If you are driven to build the future of communication where AI understands tone, context, and unspoken cues, exploring opportunities in Multimodal Speech Engineering is your next step.

Filters

×
Countries
Category
Location
Work Mode
Salary