CrawlJobs Logo
Briefcase Icon
Category Icon

Filters

×

Multimodal Speech Engineer, AI Companion Jobs

2 Job Offers

Filters
Multimodal Speech Engineer, AI Companion
Save Icon
Location Icon
Location
United States , Palo Alto
Salary Icon
Salary
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Read More
Arrow Right
Full-Stack Engineer, AI Companion
Save Icon
Location Icon
Location
United States , Palo Alto
Salary Icon
Salary
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Read More
Arrow Right
Explore cutting-edge Multimodal Speech Engineer, AI Companion jobs and step into the forefront of artificial intelligence, where you will build the next generation of interactive agents. This specialized role sits at the intersection of speech technology, machine learning, and human-computer interaction, focused on creating seamless, natural, and emotionally resonant communication for AI companions, virtual assistants, and embodied robots. Professionals in this field are responsible for developing the core speech systems that allow machines to not only talk and listen but to understand and express meaning through multiple sensory channels. A Multimodal Speech Engineer typically designs, implements, and optimizes complex models that process and generate conversational speech synchronized with other modalities like vision, gesture, and spatial audio. Common responsibilities include architecting and training large-scale speech-to-speech and text-to-speech models that are context-aware. Engineers in this domain build data pipelines for diverse speech interaction datasets, work on customizing vocal personalities and emotional tones, and crucially, ensure that generated speech is temporally and semantically aligned with visual cues such as facial expressions or body language. The goal is to move beyond simple voice responses to create holistic, engaging characters that users can interact with in intuitive ways. Typical skills and requirements for these roles are both deep and broad. A strong foundation in speech and audio signal processing, acoustic modeling, and neural speech synthesis (e.g., Tacotron, VITS) is essential. Proficiency in deep learning frameworks like PyTorch or TensorFlow is required, alongside experience with multimodal machine learning architectures that fuse data from language, audio, and vision models. Candidates generally need a proven ability to take open-ended research problems, develop novel solutions, and deploy robust, scalable systems. Strong software engineering skills for production-level code are a must. A background in linguistics or cognitive science can be beneficial for understanding prosody and pragmatics. As the field is rapidly evolving, a passion for continuous learning and innovation is critical. For those passionate about defining the future of human-AI relationships, Multimodal Speech Engineer, AI Companion jobs offer a unique opportunity to blend technical expertise with creative problem-solving. This career path is ideal for individuals who want to translate groundbreaking research into tangible experiences, crafting the very voice and interactive soul of future AI entities. Discover your role in this pioneering field and contribute to building companions that communicate, connect, and understand in profoundly human ways.

Filters

×
Category
Location
Work Mode
Salary