This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Deepgram is looking for an Audio Engineer to own and scale audio quality across our voice AI products. This role sits at the intersection of professional audio engineering and machine-learning infrastructure. You will be responsible for ensuring our voices don’t just sound “correct,” but sound genuinely great to human listeners, across thousands of voices, recording conditions, and use cases. This is a foundational role. You’ll help define how audio engineering fits into our end-to-end pipeline: from on-site voice actor recording, to speaker-specific cleanup for fine-tuning, to synthetic data generation, and large-scale TTS training. You’ll take traditionally manual, GUI-driven audio workflows and turn them into scalable, programmatic systems that can operate at Deepgram’s scale.
Job Responsibility:
Identify and correct audio artifacts, loudness inconsistencies, frequency imbalances, and sibilance issues across large-scale voice datasets
Design and implement scalable audio processing pipelines for voice data
Define and implement scalable audio processing pipelines (EQ, compression, de-essing, dynamic range optimization) and normalization strategies across inter- and intra- voice recordings
Optimize audio quality across real and synthetic voices to ensure a consistent product experience across multiple use cases
Lead audio quality decisions during on-site voice actor recording sessions, including microphone selection, placement, gain staging, and environment setup
Define, document, and enforce audio quality standards for external vendors, including recording setup requirements, signal characteristics, and post-processing expectations, ensuring vendor-produced audio meets Deepgram’s training and product needs even when recordings are not done on-site
Convert expert-driven, manual audio workflows into automated, repeatable, code-based systems
Collaborate closely with research to improve training data quality, especially TTS speaker-specific fine-tuning
Contribute to synthetic data pipelines by defining and validating acoustic characteristics, guiding how different “sound profiles” should be produced and evaluated
Requirements:
Professional audio engineering experience (studio, podcast, radio, live sound, or equivalent)
Deep understanding of EQ, compression, limiting, de-essing, and mastering techniques
Strong familiarity with professional audio tools (Adobe Audition, Logic Pro, Pro Tools, or similar)
Hands-on experience with FFmpeg and command-line audio processing tools
Solid understanding of microphone characteristics, placement, and acoustic principles
A highly trained ear for subtle audio quality differences across voices and environments
Nice to have:
Programming ability (Python preferred) to automate and scale audio workflows
Experience building custom audio plugins or DSP tools
Open-source contributions to audio or signal-processing projects
Background in batch or programmatic audio processing at scale
Familiarity with ML audio preprocessing for ASR or TTS
Experience managing large-scale audio datasets
Comfort working in creative/audio communities and technical open-source ecosystems