CrawlJobs Logo

Research Engineer – Audio & Speech Models

United States, Palo Alto · Job Posted January 13, 2026
Apply Position
Job Link Share

Job Description

As a Research Engineer - Audio & Speech Models, you will be a core contributor on Zyphra’s Audio Team, building the next generation of open-source text-to-speech and audio models. You will be deeply involved in the entire model training process from data gathering and processing to designing novel architectures and training methodologies.

Job Responsibility

  • Building the next generation of open-source text-to-speech and audio models
  • Deeply involved in the entire model training process from data gathering and processing to designing novel architectures and training methodologies
  • Work across: Large-scale audio training runs
  • Performance optimization of our training stack
  • Audio dataset collection, processing, and evaluation
  • Architecture and training methodology ablations and improvements

Requirements

  • Strong research taste and intuition. The ability to work through a research project from conception to execution to write-up
  • Strong implementation and prototyping ability (can take an idea from conception to experimentation quickly)
  • The ability to work well with others in a high-paced research setting
  • Can rapidly learn new fields and are excited to implement new ideas
  • Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale.

Nice to have

  • Expertise and intuition for training models in the audio domain, including text-to-speech, ASR, speech-to-speech, speech-emotion-recognition, or other models
  • Experience in training audio autoencoders
  • Understanding of signal processing, especially of audio signals
  • Experience with diffusion models, consistency models, or GANs
  • Experience with training on large-scale (multi-node) GPU clusters
  • Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
  • Understanding of and interest in large-scale, highly parallel data processing pipelines
  • Proficiency with PyTorch and Python
  • Experience contributing to large pre-existing codebases and rapidly getting up to speed
  • Previously published machine learning research in well-respected venues
  • Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)

What we offer

  • Comprehensive medical, dental, vision, and FSA plans
  • Competitive compensation and 401(k)
  • Relocation and immigration support on a case-by-case basis
  • On-site meals prepared by a dedicated culinary team
  • Thursday Happy Hours

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Engineer – Audio & Speech Models

8 matching positions

Senior Speech & Audio Biomarkers ML Engineer / Data Scientist / LLM Researcher

Adalyon is transforming clinical trials with a behavioural-intelligence platform...
Location
Location
Finland
Salary
Salary:
Not provided
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree PhD, postdoctoral experience, or equivalent research depth in speech technology, audio signal processing, acoustics, machine learning, data science, computational linguistics, or a related field
  • Audio and NLP experience – You have built systems that process raw audio and transcripts to derive actionable insights. Familiarity with prosodic and spectral features, and the ability to engineer features like jitter, shimmer and harmonic-to-noise ratio, which have been shown to correlate with cognitive and emotional conditions
  • Speech processing toolkits: Experience with speech processing toolkits (e.g., librosa, Kaldi, Praat) and ML frameworks (PyTorch, TensorFlow, scikit-learn) is essential
  • LLM expertise – Hands-on experience with large language models, including prompting, fine-tuning and integrating them into downstream ML pipelines. Ability to interpret and control LLM outputs to ensure transparency and reproducibility, avoiding the unpredictable behaviour of generic LLMs
  • Startup mindset – Comfortable working in an agile, evolving environment. You take initiative, think creatively and can operate with limited structure. You thrive when delivering an MVP while planning for scalable solutions
  • Practical programming ability, ideally in Python and relevant scientific/data tooling. You do not need to be a software engineer, but you must be able to build the systems and pipelines needed for your research.
Job Responsibility
Job Responsibility
  • Conversational design & data pipeline
  • Signal processing & feature extraction
  • Model development & integration
  • Validation & evidence generation
  • Research & innovation
What we offer
What we offer
  • A competitive salary package that reflects your experience and the value you create
  • The opportunity to work with advanced AI, acoustic analysis, and speech-based biomarker technology at an early stage
  • A central and highly influential role with direct access to research and technology leadership
  • High autonomy, high visibility, and the opportunity to shape the scientific foundation of a growing company
  • A dynamic and flexible startup environment with room for deep technical discussion, scientific exploration, and practical impact
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Speech Recognition (ASR)

We are on a mission to ensure everyone has access to medical expertise, no matte...
Location
Location
Denmark , København
Salary
Salary:
Not provided
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming skills in Python and the ability to contribute to production-grade codebases
  • Hands-on experience in speech recognition and ASR
  • Experience building ML systems that can be deployed and operated, including pipelines, CI and CD practices, and monitoring
  • Clear communication and collaboration skills across research, engineering, and product
  • A Master’s degree in computer science, engineering, mathematics, statistics, physics, or a related field, or equivalent professional experience
Job Responsibility
Job Responsibility
  • Train and fine-tune ASR models at scale, including dataset strategy, augmentation, and domain adaptation to real-world clinical audio
  • Build and improve validation and evaluation frameworks, including WER and targeted analysis across speakers, environments, devices, and clinical terminology
  • Deploy and operate ASR inference services with focus on reliability, latency, and efficiency in production
  • Optimize inference latency and throughput, including batching strategies, model export choices, and hardware-aware profiling
  • Build and maintain APIs and services in frameworks like FastAPI, Kafka, and NVIDIA Triton, and deploy and run them on Kubernetes
  • Take technical ownership of core ASR components, shaping best practices for modelling, evaluation, and production reliability across the team supporting the growth of engineers working on speech systems
  • Work closely with product and platform teams on safe rollouts, monitoring, and continuous improvement based on real-world feedback
What we offer
What we offer
  • Equipment provided by Corti
  • Fulltime
Read More
Arrow Right

AI Engineer - Speech & NLP

Interhuman AI is building the next generation of social intelligence infrastruct...
Location
Location
Denmark , København
Salary
Salary:
55000.00 - 65000.00 DKK / Year
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP
  • Track record of building and shipping models
  • Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow)
  • Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models)
  • You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input
  • Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation
  • Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment
  • Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss
  • Stay at the frontier of multimodal research and translate relevant advances into our production stack
  • Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements
What we offer
What we offer
  • Competitive salary and meaningful equity in an early-stage, venture-backed company
  • Direct influence on technical direction—your work shapes the product, not just a feature
  • A small, focused team where your contributions are visible and impactful from day one
  • Flexibility on location and working arrangements
  • Fulltime
Read More
Arrow Right

Audio Algorithm Engineer

Plaud is building the next generation intelligence infrastructure and interfaces...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
plaud.ai Logo
Plaud
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3 to 5 years of speech algorithm training experience, with experience in fine-tuning and training SpeechLLM
  • Experience processing hundreds of thousands of hours of speech data and training speech recognition models
  • Familiar with SpeechLLM, speech SSL training, with from-scratch training experience for models similar to StepAudio, Qwen3omni, etc.
  • Papers in top speech conferences like Interspeech, ICASSP, or patents related to speech
Job Responsibility
Job Responsibility
  • For the multi-language ASR system, research optimization solutions for terminology thesaurus from papers, and design reasonable terminology filtering and hotword optimization solutions
  • Implement multi-language hotword algorithms based on SpeechLLM and optimize their effects
  • collaborate with the engineering team to deploy the hotword recognition solution
  • Combine scenario data to fine-tune the speech recognition model and improve ASR recognition effects across multiple languages and industries
  • Build a test set and system for keyword recognition and industry recognition engines, and evaluate the terminology recognition and industry engine effects of open-source models and commercial interfaces
What we offer
What we offer
  • Top-tier healthcare for employees and dependents, including dental and vision, and a generous employer subsidy
  • 401(k) plan for full time employees with company matching
  • Unlimited PTO, plus 13 paid holidays
  • 12 weeks of paid time off to spend time with your new family, regardless of gender
  • New hires are equipped with their choice of new top-of-the-line laptops and workstation setups
  • Best office equipment
  • Annual offsites
  • Free office drinks and snacks
  • performance bonus
  • Equity
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Audio Quality with AI (PhD)

The Meta Reality Labs Research Team brings together a world-class team of resear...
Location
Location
United States , Redmond
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in the field of Speech and Hearing Science, Auditory Neuroscience, Computational Neuroscience, Computer Science, Artificial Intelligence, Generative AI, Transformer Models, Machine Learning, Signal Processing or Computer vision
  • 3+ years experience with Python, Matlab, or similar
  • 3+ years experience with machine learning software platforms such as PyTorch, TensorFlow, etc
  • Background in speech perception, psychoacoustics, or acoustic phonetics
  • Experience deploying novel audio computational models and LLMs
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Investigate systematic phonemic errors as causal factors in perceived speech quality degradation, and link them to human perceptual judgments
  • Build and curate datasets and benchmarks of speech for phoneme-level analysis
  • Explore and compare the capabilities of audio and video (multimodal) LLMs as tools to support this analysis
  • Relate findings to human perceptual data (quality preference and intelligibility) and translate them into actionable insights for research and engineering teams
  • Where appropriate, adapt multimodal models to the task in a supporting capacity
  • Collaborate with researchers, engineers, and cross-functional partners to define goals, communicate findings, and drive improvements in speech quality
  • Develop tools and infrastructure to streamline and scale the analysis
  • Stay current with research in speech perception and audio quality and intelligibility assessment, and incorporate best practices into Meta's workflows
  • Disseminate results through internal reports and presentations, and, when appropriate, external publications
What we offer
What we offer
  • benefits
  • Fulltime
Read More
Arrow Right

Research Scientist Intern, Audio

We are seeking a highly motivated and talented Audio Research Scientist Intern t...
Location
Location
United States , Redmond
Salary
Salary:
7313.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or in the process of obtaining a PhD in Computer Science, Electrical Engineering, Auditory Neuroscience, Audio Signal processing or a related field
  • Experience in building deep learning models
  • Experience with LLM models
  • 2+ years experience with Python and PyTorch
  • Understanding of audio processing concepts
  • Proven communication and collaboration skills
  • Demonstrated skill in learning and applying new concepts, techniques, and tools to solve complex problems
  • Must obtain work authorization in country of employment at the time of hire and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Train or fine-tune audio models in Pytorch
  • Process and analyze speech and audio data: including binaural data simulation, data cleaning, feature extraction and visualization
  • Collaborate with other researchers to collect data through listening experiments
  • Design and conduct experiments to evaluate the performance of these models and interpret results
  • Communicate findings through written reports and presentations
Read More
Arrow Right

AI Research Scientist - Meta Superintelligence Labs (PhD)

Meta is seeking a Research Scientist to join its Meta Superintelligence Labs org...
Location
Location
United States , Menlo Park
Salary
Salary:
122000.00 - 181000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Ph.D. and established track record of leading and/or contributing to influential research, e.g. as evidenced by high-impact publications at peer-reviewed AI conferences (e.g. NeurIPS, CVPR, ICML, ICLR, ICCV, ACL, Interspeech and ICASSP)
  • Experience communicating research for public audiences of peers, as well as non-technical audiences
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
Job Responsibility
Job Responsibility
  • Perform research to advance the state of the art in speech and audio LLMs
  • Work with researchers and engineers in a highly collaborative environment to achieve goals and milestones
  • Influence the future of research in frontier modeling with detailed technical reports
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Applied Scientist

As an Applied Scientist at Dialpad, you'll be a key driver within our AI team, c...
Location
Location
Canada , Vancouver
Salary
Salary:
161500.00 - 191500.00 CAD / Year
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's or PhD degree in Computer Science, Machine Learning, Computational Linguistics, or a related quantitative field
  • 2+ years of industry experience in Machine Learning/NLP for Master's degree holders, or 1+ years for PhD holders
  • Deep understanding of LLMs: Demonstrated experience with training, fine-tuning (PEFT/LoRA), and alignment techniques (RLHF/DPO) for specific domains or tasks
  • Experience with Agentic Systems: Familiarity with building autonomous agents, including concepts like tool use, function calling, reasoning chains (CoT), and memory management
  • Strong proficiency in Python and PyTorch, with the ability to write clean, production-ready research code
  • Research Track Record: A history of publishing in top-tier conferences (ACL, EMNLP, NeurIPS, ICASSP) is highly valued
  • Multimodal Awareness: Familiarity with speech technologies (ASR, TTS) or processing real-time audio streams is a strong plus
  • Ability to bridge the gap between research and product, translating complex technical concepts into business value
  • Familiarity with version control tools like Git for collaborative projects
Job Responsibility
Job Responsibility
  • Research and develop state-of-the-art algorithms for autonomous voice agents, specifically focusing on real-time speech processing and reasoning loops
  • Advance DialpadGPT: Design and execute distributed training strategies to optimize our proprietary LLMs for agentic behaviors, including precise tool use, instruction following, and latency-constrained generation
  • Conduct rigorous evaluation and monitoring of model performances and troubleshoot issues with a keen understanding of resultant business impacts
  • Design and implement orchestration layers that effectively chain LLMs with external tools and APIs to solve complex customer problems autonomously
  • Work with large-scale multimodal datasets (text, audio) to improve model robustness and alignment
  • Collaborate with engineering, product, and design teams to deploy scalable, low-latency models and algorithms in production
  • Submit papers to top-tier academic conferences (ACL, EMNLP, NeurIPS) and contribute to the team's research culture
What we offer
What we offer
  • Work at the center of the AI transformation in business communications
  • Build and ship agentic AI products that are redefining how companies operate
  • Join a team where AI amplifies every employee's impact
  • Competitive salary, comprehensive benefits, and real opportunities for growth
  • Fulltime
Read More
Arrow Right