Multimodal Speech Engineer Job at 1X Technologies (Palo Alto)

Multimodal Speech Engineer, AI Companion

As a Multimodal Speech Engineer on the AI Companion Team, you will lead the deve...

Location

United States , Palo Alto

Salary:

150000.00 - 250000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

Requirements

3+ years of experience in speech and audio modeling domains
Experience with multi-modal conversational models (language, audio, vision)
Ability to take open-ended problems in conversation modeling, develop creative solutions, build proof-of-concepts, and scale them to production

Job Responsibility

Design and implement data pipelines for large-scale speech interactions using internal and external datasets
Train speech-to-speech models that incorporate awareness of NEO’s physical form
Create dynamic responses for a wide range of user queries
Synchronize NEO’s speech with physical gestures and body language
Customize NEO’s speech behavior to reflect different personalities

What we offer

Equity
Health, dental, and vision insurance
401(k) with company match
Paid time off and holidays

Fulltime

AI Engineer - Speech & NLP

Interhuman AI is building the next generation of social intelligence infrastruct...

Location

Denmark , København

Salary:

55000.00 - 65000.00 DKK / Year

Life Science Talent

Expiration Date

Until further notice

Requirements

PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP
Track record of building and shipping models
Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow)
Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models)
You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input
Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders

Job Responsibility

Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation
Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment
Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss
Stay at the frontier of multimodal research and translate relevant advances into our production stack
Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements

What we offer

Competitive salary and meaningful equity in an early-stage, venture-backed company
Direct influence on technical direction—your work shapes the product, not just a feature
A small, focused team where your contributions are visible and impactful from day one
Flexibility on location and working arrangements

Fulltime

Research Scientist Intern, Audio Quality with AI (PhD)

The Meta Reality Labs Research Team brings together a world-class team of resear...

Location

United States , Redmond

Salary:

7650.00 - 12134.00 USD / Month

AI Research Scientist, VLM (vision language models)

Meta builds technologies that help people connect, find communities, and grow bu...

Location

United States , Bellevue

Salary:

154000.00 - 217000.00 USD / Year

Senior Applied Scientist and Principal Applied Scientist

The Core AI Speech Group brings together talents in the areas of signal processi...

Location

United States , Redmond

Salary:

119800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research) OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Developing novel speech algorithms to advance state-of-the-art speech technologies for real world user scenarios, especially in integrating speech with LLM for multimodal modeling
Helps address scalability problems by adjusting to stakeholder needs
Works with large-scale computing frameworks, data analysis systems, and modeling environments to improve models
Applies the model to real products, and then verifies effects through iterations
Experiments by putting multiple models in production and evaluating their performance. Continues to monitor how algorithm performs against expected behaviors and performance or accuracy guardrails

Fulltime

Language Engineer

The Amazon Artificial General Intelligence (AGI) Data Services organization is r...

Location

United States , Sunnyvale; Boston; Bellevue

Salary:

86500.00 - 151400.00 USD / Year

Amazon

Expiration Date

Until further notice

Requirements

Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
Experience with language annotation and other forms of data markup
Experience in one or more scripting languages (e.g., Python, Ruby, Perl)
Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
2+ years experience in computational linguistics or language data processing or AI data creation
Experience working with speech, text, and multimodal data in multiple languages
Excellent communication, strong organizational skills and very detailed oriented
Comfortable working in a fast paced, highly collaborative, dynamic work environment

Job Responsibility

Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
Analyze and extract insights from large amounts of data
Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
Use modeling tools to bootstrap or test new AI functionalities
Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models

What we offer

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave
sign-on payments
restricted stock units (RSUs)

Fulltime

AI Research Engineer - Social Products (Technical Leadership)

We're hiring Research Engineers to join teams across Meta working at the interse...

Location

United States , Bellevue

Salary:

219000.00 - 301000.00 USD / Year

Applied Scientist

As an Applied Scientist at Dialpad, you'll be a key driver within our AI team, c...

Location

Canada , Vancouver

Salary:

161500.00 - 191500.00 CAD / Year

Dialpad

Expiration Date

Until further notice

Requirements

Master's or PhD degree in Computer Science, Machine Learning, Computational Linguistics, or a related quantitative field
2+ years of industry experience in Machine Learning/NLP for Master's degree holders, or 1+ years for PhD holders
Deep understanding of LLMs: Demonstrated experience with training, fine-tuning (PEFT/LoRA), and alignment techniques (RLHF/DPO) for specific domains or tasks
Experience with Agentic Systems: Familiarity with building autonomous agents, including concepts like tool use, function calling, reasoning chains (CoT), and memory management
Strong proficiency in Python and PyTorch, with the ability to write clean, production-ready research code
Research Track Record: A history of publishing in top-tier conferences (ACL, EMNLP, NeurIPS, ICASSP) is highly valued
Multimodal Awareness: Familiarity with speech technologies (ASR, TTS) or processing real-time audio streams is a strong plus
Ability to bridge the gap between research and product, translating complex technical concepts into business value
Familiarity with version control tools like Git for collaborative projects

Job Responsibility

Research and develop state-of-the-art algorithms for autonomous voice agents, specifically focusing on real-time speech processing and reasoning loops
Advance DialpadGPT: Design and execute distributed training strategies to optimize our proprietary LLMs for agentic behaviors, including precise tool use, instruction following, and latency-constrained generation
Conduct rigorous evaluation and monitoring of model performances and troubleshoot issues with a keen understanding of resultant business impacts
Design and implement orchestration layers that effectively chain LLMs with external tools and APIs to solve complex customer problems autonomously
Work with large-scale multimodal datasets (text, audio) to improve model robustness and alignment
Collaborate with engineering, product, and design teams to deploy scalable, low-latency models and algorithms in production
Submit papers to top-tier academic conferences (ACL, EMNLP, NeurIPS) and contribute to the team's research culture

What we offer

Work at the center of the AI transformation in business communications
Build and ship agentic AI products that are redefining how companies operate
Join a team where AI amplifies every employee's impact
Competitive salary, comprehensive benefits, and real opportunities for growth

Fulltime

Select Country

Multimodal Speech Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?