CrawlJobs Logo

Multimodal Speech Engineer

United States, Palo Alto 150000.00 - 250000.00 USD / Year · Job Posted December 01, 2025
Apply Position
Job Link Share

Job Description

The AI Companion team creates the speech interface for NEO, as well as the physical awareness behaviors that evokes trust, warmth, and competence when NEO interacts with people. As a Multimodal Speech Engineer on the AI Companion Team, you will lead the effort to create a conversational speech model, from design to data collection to deployment. You will develop real-time architectures that enable NEO to not only converse with users, but also incorporate other modalities like vision, spatial audio, and body language. You will work closely with the design team to reflect NEO’s personality and 1X’s brand values in the way NEO speaks and responds to users, and the autonomy team to ensure that NEO’s speech models are aware of its own physical capabilities.

Job Responsibility

  • Design and implement data pipelines for large scale speech interactions from NEO data and external datasets
  • Train speech2speech models to be aware of NEO’s embodiment
  • Design appropriate responses for a variety of user queries
  • Synchronize speech with body language
  • Customize NEO with different personalities

Requirements

  • 3+ years of experience in speech and audio modeling domains
  • Experience in multi-modal conversational models (language, audio, vision) is a strong plus
  • Ability to take open-ended problems in conversation models, come up with creative solutions, implement proof-of-concepts, and translate those to production.

Nice to have

Experience in multi-modal conversational models (language, audio, vision)

What we offer

  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Multimodal Speech Engineer

8 matching positions

Multimodal Speech Engineer, AI Companion

As a Multimodal Speech Engineer on the AI Companion Team, you will lead the deve...
Location
Location
United States , Palo Alto
Salary
Salary:
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in speech and audio modeling domains
  • Experience with multi-modal conversational models (language, audio, vision)
  • Ability to take open-ended problems in conversation modeling, develop creative solutions, build proof-of-concepts, and scale them to production
Job Responsibility
Job Responsibility
  • Design and implement data pipelines for large-scale speech interactions using internal and external datasets
  • Train speech-to-speech models that incorporate awareness of NEO’s physical form
  • Create dynamic responses for a wide range of user queries
  • Synchronize NEO’s speech with physical gestures and body language
  • Customize NEO’s speech behavior to reflect different personalities
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

AI Engineer - Speech & NLP

Interhuman AI is building the next generation of social intelligence infrastruct...
Location
Location
Denmark , København
Salary
Salary:
55000.00 - 65000.00 DKK / Year
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP
  • Track record of building and shipping models
  • Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow)
  • Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models)
  • You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input
  • Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation
  • Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment
  • Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss
  • Stay at the frontier of multimodal research and translate relevant advances into our production stack
  • Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements
What we offer
What we offer
  • Competitive salary and meaningful equity in an early-stage, venture-backed company
  • Direct influence on technical direction—your work shapes the product, not just a feature
  • A small, focused team where your contributions are visible and impactful from day one
  • Flexibility on location and working arrangements
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Audio Quality with AI (PhD)

The Meta Reality Labs Research Team brings together a world-class team of resear...
Location
Location
United States , Redmond
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in the field of Speech and Hearing Science, Auditory Neuroscience, Computational Neuroscience, Computer Science, Artificial Intelligence, Generative AI, Transformer Models, Machine Learning, Signal Processing or Computer vision
  • 3+ years experience with Python, Matlab, or similar
  • 3+ years experience with machine learning software platforms such as PyTorch, TensorFlow, etc
  • Background in speech perception, psychoacoustics, or acoustic phonetics
  • Experience deploying novel audio computational models and LLMs
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Investigate systematic phonemic errors as causal factors in perceived speech quality degradation, and link them to human perceptual judgments
  • Build and curate datasets and benchmarks of speech for phoneme-level analysis
  • Explore and compare the capabilities of audio and video (multimodal) LLMs as tools to support this analysis
  • Relate findings to human perceptual data (quality preference and intelligibility) and translate them into actionable insights for research and engineering teams
  • Where appropriate, adapt multimodal models to the task in a supporting capacity
  • Collaborate with researchers, engineers, and cross-functional partners to define goals, communicate findings, and drive improvements in speech quality
  • Develop tools and infrastructure to streamline and scale the analysis
  • Stay current with research in speech perception and audio quality and intelligibility assessment, and incorporate best practices into Meta's workflows
  • Disseminate results through internal reports and presentations, and, when appropriate, external publications
What we offer
What we offer
  • benefits
  • Fulltime
Read More
Arrow Right

AI Research Scientist, VLM (vision language models)

Meta builds technologies that help people connect, find communities, and grow bu...
Location
Location
United States , Bellevue
Salary
Salary:
154000.00 - 217000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • A PhD in AI, computer science, or related technical fields
  • Publications in machine learning, computer vision, NLP, speech
  • Experience writing software and executing complex experiments involving large AI models and datasets
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Lead, collaborate, and execute on research that pushes forward the state of the art in multimodal reasoning and generation research
  • Work towards long-term ambitious research goals, while identifying intermediate milestones
  • Directly contribute to experiments, including designing experimental details, writing reusable code, running evaluations, and organizing results
  • Work with a large team
  • Contribute to publications and open-sourcing efforts
  • Mentor other team members. Play a significant role in healthy cross-functional collaboration
  • Prioritize research that can be applied to Meta's product development
  • Push state of the art in multimodal generative AI
  • Explore new techniques for advanced reasoning and multimodal understanding for AI Assistants
  • Mentor and work with AI/ML engineers to find a path from research to production
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist and Principal Applied Scientist

The Core AI Speech Group brings together talents in the areas of signal processi...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research) OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Developing novel speech algorithms to advance state-of-the-art speech technologies for real world user scenarios, especially in integrating speech with LLM for multimodal modeling
  • Helps address scalability problems by adjusting to stakeholder needs
  • Works with large-scale computing frameworks, data analysis systems, and modeling environments to improve models
  • Applies the model to real products, and then verifies effects through iterations
  • Experiments by putting multiple models in production and evaluating their performance. Continues to monitor how algorithm performs against expected behaviors and performance or accuracy guardrails
  • Fulltime
Read More
Arrow Right

Language Engineer

The Amazon Artificial General Intelligence (AGI) Data Services organization is r...
Location
Location
United States , Sunnyvale; Boston; Bellevue
Salary
Salary:
86500.00 - 151400.00 USD / Year
Amazon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
  • Experience with language annotation and other forms of data markup
  • Experience in one or more scripting languages (e.g., Python, Ruby, Perl)
  • Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
  • 2+ years experience in computational linguistics or language data processing or AI data creation
  • Experience working with speech, text, and multimodal data in multiple languages
  • Excellent communication, strong organizational skills and very detailed oriented
  • Comfortable working in a fast paced, highly collaborative, dynamic work environment
Job Responsibility
Job Responsibility
  • Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
  • Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
  • Analyze and extract insights from large amounts of data
  • Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
  • Use modeling tools to bootstrap or test new AI functionalities
  • Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models
What we offer
What we offer
  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
  • Fulltime
Read More
Arrow Right

AI Research Engineer - Social Products (Technical Leadership)

We're hiring Research Engineers to join teams across Meta working at the interse...
Location
Location
United States , Bellevue
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience with large scale model training, implementing algorithms, and evaluating speech-based systems
  • 5+ YOE as an Applied AI Research Scientist or Applied AI Research Engineer
Job Responsibility
Job Responsibility
  • Contribute to the training of next-generation multimodal foundation models, advance their capabilities in understanding, generation, and grounding, and enable them for downstream product use-cases
  • Support creative data sourcing, high-quality pre/mid/post-training data curation, and scale and optimize data pipelines for multimodal large language models (LLMs)
  • Lead, collaborate, and execute on research that pushes forward the state of the art in multimodal reasoning and generation research, and prioritize research that can be directly applied to Meta's product development
What we offer
What we offer
  • bonus
  • equity
  • Fulltime
Read More
Arrow Right

Applied Scientist

As an Applied Scientist at Dialpad, you'll be a key driver within our AI team, c...
Location
Location
Canada , Vancouver
Salary
Salary:
161500.00 - 191500.00 CAD / Year
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's or PhD degree in Computer Science, Machine Learning, Computational Linguistics, or a related quantitative field
  • 2+ years of industry experience in Machine Learning/NLP for Master's degree holders, or 1+ years for PhD holders
  • Deep understanding of LLMs: Demonstrated experience with training, fine-tuning (PEFT/LoRA), and alignment techniques (RLHF/DPO) for specific domains or tasks
  • Experience with Agentic Systems: Familiarity with building autonomous agents, including concepts like tool use, function calling, reasoning chains (CoT), and memory management
  • Strong proficiency in Python and PyTorch, with the ability to write clean, production-ready research code
  • Research Track Record: A history of publishing in top-tier conferences (ACL, EMNLP, NeurIPS, ICASSP) is highly valued
  • Multimodal Awareness: Familiarity with speech technologies (ASR, TTS) or processing real-time audio streams is a strong plus
  • Ability to bridge the gap between research and product, translating complex technical concepts into business value
  • Familiarity with version control tools like Git for collaborative projects
Job Responsibility
Job Responsibility
  • Research and develop state-of-the-art algorithms for autonomous voice agents, specifically focusing on real-time speech processing and reasoning loops
  • Advance DialpadGPT: Design and execute distributed training strategies to optimize our proprietary LLMs for agentic behaviors, including precise tool use, instruction following, and latency-constrained generation
  • Conduct rigorous evaluation and monitoring of model performances and troubleshoot issues with a keen understanding of resultant business impacts
  • Design and implement orchestration layers that effectively chain LLMs with external tools and APIs to solve complex customer problems autonomously
  • Work with large-scale multimodal datasets (text, audio) to improve model robustness and alignment
  • Collaborate with engineering, product, and design teams to deploy scalable, low-latency models and algorithms in production
  • Submit papers to top-tier academic conferences (ACL, EMNLP, NeurIPS) and contribute to the team's research culture
What we offer
What we offer
  • Work at the center of the AI transformation in business communications
  • Build and ship agentic AI products that are redefining how companies operate
  • Join a team where AI amplifies every employee's impact
  • Competitive salary, comprehensive benefits, and real opportunities for growth
  • Fulltime
Read More
Arrow Right