CrawlJobs Logo

Audio Inference Engineer, Model Efficiency

cohere.com Logo

Cohere

Location Icon

Location:
United States; Canada , New York

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our team is a fast-growing group of committed researchers and engineers. The mission of the team is to build reliable machine learning systems and optimize audio inference serving efficiency using innovative techniques. As an engineer on this team, you will work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads. You’ll collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference.

Job Responsibility:

  • Work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads
  • Collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference

Requirements:

  • Significant experience developing high-performance audio or machine learning inference systems
  • Proficiency with programming languages such as C++ and Python
  • Hands-on experience with deep learning models for audio, speech, or language applications
  • A bias for action and a strong results-oriented mindset

Nice to have:

  • GPU programming, low-level system optimization, model parallelization techniques over multiple GPUs
  • Have experience with duplex real-time streaming architectures
  • Internals of machine learning frameworks for audio (such as PyTorch, TensorFlow, or specialized audio libraries)
  • Have experience with inference framework like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
  • Sequence modeling (e.g., transformers for audio/speech) and end-to-end audio pipeline optimization
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Audio Inference Engineer, Model Efficiency

Senior Inference ML Runtime Engineer

The Inference ML Engineering team at Cerebras Systems is dedicated to enabling o...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Mathematics, or a related field
  • 8+ years of experience in large-scale software engineering, with a focus on deep learning or related domains
  • Proficiency in Python for building and maintaining scalable systems
  • Advanced proficiency in C++, with an emphasis on multi-threaded programming, performance optimization, and system-level development
  • Demonstrated experience driving cross-functional projects
  • Experience building and scaling large-scale inference systems for LLMs or multimodal models
  • Familiarity with LLM serving frameworks, such as vLLM, SGLang, and TensorRT-LLM
  • Solid understanding of software architectural patterns for large-scale, high-performance applications
  • Hands-on experience with ML frameworks, such as PyTorch, and a strong understanding of their underlying architectures
  • Strong problem-solving skills, with the ability to balance technical depth with practical implementation constraints
Job Responsibility
Job Responsibility
  • Drive and provide technical guidance to a team of software engineers working on complex machine learning integration projects
  • Design and implement ML features (e.g., structured outputs, biased sampling, predicted outputs) that improve performance of generative AI models at inference time
  • Design and implement high-throughput, low-latency multimodal inference models that support delivery of image, audio, and video inputs and outputs
  • Maintain our scalable serving backend for handling many concurrent requests per minute
  • Scale our inference service by implementing detailed observability throughout the entire stack
  • Analyze and improve latency, throughput, memory usage, and compute efficiency on the service and the implementation of various features
  • Optimize software to accelerate generative LLM inference by achieving high throughput and low latency
  • Stay up-to-date with advancements in machine learning and deep learning, and apply state-of-the-art techniques to enhance our solutions
  • Evaluate trade-offs between different approaches, clearly articulate design choices, and develop detailed proposals for implementing new features
  • Uncover, scope, and prioritize significant areas of technical debt across the software stack to ensure continued high quality of the inference service
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Research Scientist Intern, Real-Time Multimodal AI

Reality Labs is building the future of connection through world-class AR/VR hard...
Location
Location
United States , Burlingame
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, Electrical Engineering, or a related field
  • 2+ years of research experience in one or more of the following areas: multimodal learning, vision-language models, large language models, or foundation model fine-tuning
  • Hands-on experience fine-tuning large foundation models (e.g., LLaVA, InternVL, Qwen-VL, LLaMA, or similar)
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch
  • Excellent communication skills and ability to work independently
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Research and develop novel approaches for fine-tuning large multimodal foundation models (vision-language, audio-visual) for real-time applications
  • Design and implement efficient inference pipelines for deploying fine-tuned models in real-time communication scenarios
  • Explore agentic architectures that leverage fine-tuned models as tools within larger AI systems
  • Collaborate with cross-functional teams to integrate models into prototype experiences
  • Document and present research progress with the goal of publishing findings at top-tier ML/CV conferences
  • Contribute to building working prototypes that demonstrate the capabilities of fine-tuned multimodal models
Read More
Arrow Right

Research Engineer, RealTime AI, MSL PAR

We are seeking research engineers to join the Product and Applied Research (PAR)...
Location
Location
United States , Bellevue, WA
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry experience in LLM/NLP, audio, or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross functional partners to impact, and/or influencing strategy across multiple teams
  • Skilled in model training, data, or inference & efficiency for LLMs
  • Experience building products/systems based on machine learning, reinforcement learning and/or deep learning methods
  • Programming experience in Python and hands-on experience with frameworks like PyTorch
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s AI Characters products
  • Lead the development of new algorithms and systems for LLM post-training, evaluation and efficiency
  • Support creative data sourcing, high-quality post-training data curation, and scale and optimize data pipelines for large language models (LLMs)
  • Develop and integrate models,orchestrations and RAGs in production
  • Analyze and interpret experimental results, iterate on model architectures, and drive continuous improvement
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Software Engineer 2

Microsoft Azure AI Inference platform is the next generation cloud business posi...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang, OR equivalent experience
  • Ability to meet Microsoft, customer, and/or government security screening requirements for this role
  • Technical background with a solid foundation in software engineering principles, distributed computing, and system architecture
  • Experience working on high-scale, reliable online systems
  • Experience with real-time online services requiring low latency and high throughput
  • Experience working with Layer 7 (L7) network proxies and gateways
  • Knowledge of network architecture and concepts, including HTTP and TCP protocols, authentication, and session management
  • Knowledge and experience with OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Ability to independently lead projects
Job Responsibility
Job Responsibility
  • Design and implement core inference infrastructure for serving frontier AI models in production
  • Identify and drive improvements to end-to-end inference performance and efficiency of state-of-the-art LLMs and GenAI models from OpenAI, Anthropic and xAI hosted on AI Foundary
  • Design and implement efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload
  • Scale the platform to support the growing inferencing demand and maintain high availability
  • Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them
  • Drive generic features to cater to the needs of customers such as GitHub, M365, Microsoft AI and third-party companies
  • Collaborate with our partners both internal and external
  • Embody Microsoft's Culture and Values
  • Fulltime
Read More
Arrow Right

Senior Data Scientist

We are seeking a Senior Data Scientist with deep expertise in unstructured data ...
Location
Location
Taiwan
Salary
Salary:
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI
  • Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques
  • Experience working with OCR, ASR, and TTS applications in real-world deployments
  • Proven experience deploying AI models in production, with real-world examples of scaled AI applications
  • Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices
  • Proficiency in Python, PyTorch, and ML libraries
  • Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures
  • Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly
  • Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation
  • Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation
Job Responsibility
Job Responsibility
  • Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance
  • Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency
  • Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput
  • Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients
  • Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates
  • Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions
  • Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency
  • Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions
  • Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements
  • Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models
Read More
Arrow Right

Senior Data Scientist

We are seeking a Senior Data Scientist with deep expertise in unstructured data ...
Location
Location
Salary
Salary:
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI
  • Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques
  • Experience working with OCR, ASR, and TTS applications in real-world deployments
  • Proven experience deploying AI models in production, with real-world examples of scaled AI applications
  • Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices
  • Proficiency in Python, PyTorch, and ML libraries
  • Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures
  • Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly
  • Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation
  • Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation
Job Responsibility
Job Responsibility
  • Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance
  • Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency
  • Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput
  • Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients
  • Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates
  • Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions
  • Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency
  • Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions
  • Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements
  • Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models
Read More
Arrow Right

Senior Machine Learning Engineer, Speech Recognition (ASR)

We are on a mission to ensure everyone has access to medical expertise, no matte...
Location
Location
Denmark , København
Salary
Salary:
Not provided
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming skills in Python and the ability to contribute to production-grade codebases
  • Hands-on experience in speech recognition and ASR
  • Experience building ML systems that can be deployed and operated, including pipelines, CI and CD practices, and monitoring
  • Clear communication and collaboration skills across research, engineering, and product
  • A Master’s degree in computer science, engineering, mathematics, statistics, physics, or a related field, or equivalent professional experience
Job Responsibility
Job Responsibility
  • Train and fine-tune ASR models at scale, including dataset strategy, augmentation, and domain adaptation to real-world clinical audio
  • Build and improve validation and evaluation frameworks, including WER and targeted analysis across speakers, environments, devices, and clinical terminology
  • Deploy and operate ASR inference services with focus on reliability, latency, and efficiency in production
  • Optimize inference latency and throughput, including batching strategies, model export choices, and hardware-aware profiling
  • Build and maintain APIs and services in frameworks like FastAPI, Kafka, and NVIDIA Triton, and deploy and run them on Kubernetes
  • Take technical ownership of core ASR components, shaping best practices for modelling, evaluation, and production reliability across the team supporting the growth of engineers working on speech systems
  • Work closely with product and platform teams on safe rollouts, monitoring, and continuous improvement based on real-world feedback
What we offer
What we offer
  • Equipment provided by Corti
  • Fulltime
Read More
Arrow Right

Senior MLOps Engineer - Data Ingestion - Paris

We are looking for a Senior MLOps Engineer to join the Panda Team (Data & ML Ope...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You have at least 7+ years as an MLOps Engineer or ML Platform Engineer with proven production model lifecycle management experience
  • You have expert-level experience with ML orchestration tools (MLflow, Braintrust, or similar) for batch processing and inference pipelines
  • You have a strong Site Reliability Engineering (SRE) foundation with focus on operations excellence, reliability, and observability
  • You have expertise in Python for automation and ML pipeline scripting
  • You have strong proficiency with infrastructure-as-code tools such as Terraform and container orchestration (Kubernetes)
  • You have experience with model evaluation frameworks and golden dataset management
  • You have a solid understanding of cloud infrastructure (preferably GCP, AWS, or Azure)
  • You have excellent problem-solving skills with focus on identifying and resolving infrastructure bottlenecks
  • You are fluent in English
Job Responsibility
Job Responsibility
  • Design and implement end-to-end ML model pipelines in production (LLM and custom models) with robust deployment, evaluation, and monitoring frameworks
  • Own data pseudo-anonymization architecture within ingestion services, converting Tier 0 (personal identifiers) to Tier 1 (anonymized data) while ensuring data quality and model performance
  • Build and maintain secure data export services with ML-based threat detection to prevent attack vectors (SQL injection, etc.) using adaptive models rather than manual rules
  • Manage golden datasets and implement production model evaluation frameworks to ensure anonymization quality and system reliability
  • Build and maintain data pipelines that efficiently extract, transform, and load data from various sources, handling multiple data formats (text, images, audio, video)
  • Implement automation and orchestration tools using ML orchestration platforms (MLflow, Braintrust, or similar) to streamline infrastructure provisioning and reduce manual effort
  • Monitor data and ML platforms for performance, reliability, and security
  • identify and troubleshoot issues proactively
  • Mentor team members on MLOps expertise and best practices to reduce knowledge silos and build organizational capability
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • 25 days of paid vacation per year, plus up to 14 days of RTT
  • Free mental health and coaching services through our partner Moka.care
  • Work from abroad for up to 10 days per year thanks to our flexibility days policy
  • Lunch vouchers (Swile card) worth €8.50 per working day, with €4.50 covered by Doctolib
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • 50% reimbursement of your public transport subscription
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Relocation support in case of international mobility
  • Fulltime
Read More
Arrow Right