CrawlJobs Logo

Audio Inference Engineer, Model Efficiency

cohere.com Logo

Cohere

Location Icon

Location:
United States; Canada , New York

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our team is a fast-growing group of committed researchers and engineers. The mission of the team is to build reliable machine learning systems and optimize audio inference serving efficiency using innovative techniques. As an engineer on this team, you will work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads. You’ll collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference.

Job Responsibility:

  • Work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads
  • Collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference

Requirements:

  • Significant experience developing high-performance audio or machine learning inference systems
  • Proficiency with programming languages such as C++ and Python
  • Hands-on experience with deep learning models for audio, speech, or language applications
  • A bias for action and a strong results-oriented mindset

Nice to have:

  • GPU programming, low-level system optimization, model parallelization techniques over multiple GPUs
  • Have experience with duplex real-time streaming architectures
  • Internals of machine learning frameworks for audio (such as PyTorch, TensorFlow, or specialized audio libraries)
  • Have experience with inference framework like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
  • Sequence modeling (e.g., transformers for audio/speech) and end-to-end audio pipeline optimization
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Audio Inference Engineer, Model Efficiency

Senior Inference ML Runtime Engineer

The Inference ML Engineering team at Cerebras Systems is dedicated to enabling o...
Location
Location
United States; Canada , Sunnyvale; Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Mathematics, or a related field
  • 8+ years of experience in large-scale software engineering, with a focus on deep learning or related domains
  • Proficiency in Python for building and maintaining scalable systems
  • Advanced proficiency in C++, with an emphasis on multi-threaded programming, performance optimization, and system-level development
  • Demonstrated experience driving cross-functional projects
  • Experience building and scaling large-scale inference systems for LLMs or multimodal models
  • Familiarity with LLM serving frameworks, such as vLLM, SGLang, and TensorRT-LLM
  • Solid understanding of software architectural patterns for large-scale, high-performance applications
  • Hands-on experience with ML frameworks, such as PyTorch, and a strong understanding of their underlying architectures
  • Strong problem-solving skills, with the ability to balance technical depth with practical implementation constraints
Job Responsibility
Job Responsibility
  • Drive and provide technical guidance to a team of software engineers working on complex machine learning integration projects
  • Design and implement ML features (e.g., structured outputs, biased sampling, predicted outputs) that improve performance of generative AI models at inference time
  • Design and implement high-throughput, low-latency multimodal inference models that support delivery of image, audio, and video inputs and outputs
  • Maintain our scalable serving backend for handling many concurrent requests per minute
  • Scale our inference service by implementing detailed observability throughout the entire stack
  • Analyze and improve latency, throughput, memory usage, and compute efficiency on the service and the implementation of various features
  • Optimize software to accelerate generative LLM inference by achieving high throughput and low latency
  • Stay up-to-date with advancements in machine learning and deep learning, and apply state-of-the-art techniques to enhance our solutions
  • Evaluate trade-offs between different approaches, clearly articulate design choices, and develop detailed proposals for implementing new features
  • Uncover, scope, and prioritize significant areas of technical debt across the software stack to ensure continued high quality of the inference service
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Research Scientist Intern, Real-Time Multimodal AI

Reality Labs is building the future of connection through world-class AR/VR hard...
Location
Location
United States , Burlingame
Salary
Salary:
7650.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, Electrical Engineering, or a related field
  • 2+ years of research experience in one or more of the following areas: multimodal learning, vision-language models, large language models, or foundation model fine-tuning
  • Hands-on experience fine-tuning large foundation models (e.g., LLaVA, InternVL, Qwen-VL, LLaMA, or similar)
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch
  • Excellent communication skills and ability to work independently
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Job Responsibility
Job Responsibility
  • Research and develop novel approaches for fine-tuning large multimodal foundation models (vision-language, audio-visual) for real-time applications
  • Design and implement efficient inference pipelines for deploying fine-tuned models in real-time communication scenarios
  • Explore agentic architectures that leverage fine-tuned models as tools within larger AI systems
  • Collaborate with cross-functional teams to integrate models into prototype experiences
  • Document and present research progress with the goal of publishing findings at top-tier ML/CV conferences
  • Contribute to building working prototypes that demonstrate the capabilities of fine-tuned multimodal models
Read More
Arrow Right

Research Engineer, RealTime AI, MSL PAR

We are seeking research engineers to join the Product and Applied Research (PAR)...
Location
Location
United States , Bellevue, WA
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 2+ years of industry experience in LLM/NLP, audio, or related AI/ML models
  • Experience as a formal technical lead, leading major technical initiatives with cross functional partners to impact, and/or influencing strategy across multiple teams
  • Skilled in model training, data, or inference & efficiency for LLMs
  • Experience building products/systems based on machine learning, reinforcement learning and/or deep learning methods
  • Programming experience in Python and hands-on experience with frameworks like PyTorch
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to develop Meta’s AI Characters products
  • Lead the development of new algorithms and systems for LLM post-training, evaluation and efficiency
  • Support creative data sourcing, high-quality post-training data curation, and scale and optimize data pipelines for large language models (LLMs)
  • Develop and integrate models,orchestrations and RAGs in production
  • Analyze and interpret experimental results, iterate on model architectures, and drive continuous improvement
  • Lead complex technical projects end-to-end
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right
New

Software Engineer 2

Microsoft Azure AI Inference platform is the next generation cloud business posi...
Location
Location
United States , Redmond
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang, OR equivalent experience
  • Ability to meet Microsoft, customer, and/or government security screening requirements for this role
  • Technical background with a solid foundation in software engineering principles, distributed computing, and system architecture
  • Experience working on high-scale, reliable online systems
  • Experience with real-time online services requiring low latency and high throughput
  • Experience working with Layer 7 (L7) network proxies and gateways
  • Knowledge of network architecture and concepts, including HTTP and TCP protocols, authentication, and session management
  • Knowledge and experience with OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Ability to independently lead projects
Job Responsibility
Job Responsibility
  • Design and implement core inference infrastructure for serving frontier AI models in production
  • Identify and drive improvements to end-to-end inference performance and efficiency of state-of-the-art LLMs and GenAI models from OpenAI, Anthropic and xAI hosted on AI Foundary
  • Design and implement efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload
  • Scale the platform to support the growing inferencing demand and maintain high availability
  • Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them
  • Drive generic features to cater to the needs of customers such as GitHub, M365, Microsoft AI and third-party companies
  • Collaborate with our partners both internal and external
  • Embody Microsoft's Culture and Values
  • Fulltime
Read More
Arrow Right

Senior Data Scientist

We are seeking a Senior Data Scientist with deep expertise in unstructured data ...
Location
Location
Taiwan
Salary
Salary:
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI
  • Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques
  • Experience working with OCR, ASR, and TTS applications in real-world deployments
  • Proven experience deploying AI models in production, with real-world examples of scaled AI applications
  • Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices
  • Proficiency in Python, PyTorch, and ML libraries
  • Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures
  • Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly
  • Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation
  • Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation
Job Responsibility
Job Responsibility
  • Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance
  • Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency
  • Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput
  • Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients
  • Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates
  • Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions
  • Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency
  • Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions
  • Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements
  • Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models
Read More
Arrow Right

Senior Data Scientist

We are seeking a Senior Data Scientist with deep expertise in unstructured data ...
Location
Location
Salary
Salary:
Not provided
beyond.ai Logo
Beyond Limits
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in AI, Machine Learning, and Data Science, with a strong focus on production-scale AI
  • Expertise in LLMs, including fine-tuning, distributed training, quantization, and pruning techniques
  • Experience working with OCR, ASR, and TTS applications in real-world deployments
  • Proven experience deploying AI models in production, with real-world examples of scaled AI applications
  • Strong understanding of cloud computing, containerization (Docker, Kubernetes), and ML Ops best practices
  • Proficiency in Python, PyTorch, and ML libraries
  • Hands-on experience with vector databases and retrieval-augmented generation (RAG) architectures
  • Strong awareness of AI system performance benchmarks (latency, speed, throughput) and ability to optimize models accordingly
  • Experience working with AI agents, designing real-world intelligent automation solutions beyond just open-source experimentation
  • Proficiency in transformer-based architectures (BERT, GPT, LLaMA, Whisper, etc.), including pre-training, fine-tuning, and task-specific adaptation
Job Responsibility
Job Responsibility
  • Develop and deploy AI models for unstructured data (text, speech, audio, images) with a focus on enterprise-scale performance
  • Fine-tune, optimize, and deploy LLMs and multimodal models, integrating distributed training, quantization, and pruning techniques for efficiency
  • Design and implement production-ready AI solutions, ensuring scalability, low-latency inference, and high throughput
  • Work with AI agents and automation frameworks to create intelligent, real-world AI applications for enterprise clients
  • Build and maintain end-to-end LLM Ops pipelines, ensuring efficient training, deployment, monitoring, and model updates
  • Implement vector search and retrieval-augmented generation (RAG) systems for large-scale data solutions
  • Monitor AI performance using key metrics such as speed, latency, and throughput, continuously refining models for real-world efficiency
  • Work with cloud-based AI infrastructure (AWS, GCP) and containerized environments (Docker, Kubernetes) to scale AI solutions
  • Collaborate with engineering, DevOps, and product teams to align AI solutions with business needs and client requirements
  • Implement data curation pipelines, including data collection, cleaning, deduplication, decontamination, etc. for training high-quality AI models
Read More
Arrow Right
New

Project Controls Coordinator III

Under the direction of the Supervisor Project Controls, the Analyst will perform...
Location
Location
Canada , North York
Salary
Salary:
55.00 - 58.00 CAD / Hour
https://www.randstad.com Logo
Randstad
Expiration Date
June 03, 2026
Flip Icon
Requirements
Requirements
  • Four Year Degree or combination of education and related experience
  • Minimum of 5 years of Project Controls or Project Management experience
  • Project Management professional designation is preferred
  • Experienced analytical skills including Earned Value Management
  • An independent worker within a team setting
  • Demonstrated professional engagement at a high level with work group, stakeholders, and contractors in a team setting
  • Proficient in the use of SAP, Oracle and MS office suite, intermediate+ Excel skills
  • Excellent communication, interpersonal, and organizational skills
  • Ability to effectively manage and prioritize workload, bring issues forward and develop working relationships at all levels of the organization
  • Detail oriented and understands the importance of data reconciliation
Job Responsibility
Job Responsibility
  • Analyze and maintain the project costs at the WBS level including control budget, incurred costs, commitments, and forecast
  • Provide the project team with accurate and timely cost information and reporting
  • Perform earned value measurements to anticipate forecast impacts
  • Perform monthly project close processes and prepare monthly project reports and comparative capital cost estimates for the project in Excel and EcoSys
  • Prepare and document project change orders timely in accordance with Project Management Office standards
  • Engage the Project Managers in meetings and discussions to review and reforecast project costs
  • Review cost transactions to ensure accurate project costs
  • Communicate with larger Controls team for the project
  • Liaise with Project Managers and Field Cost Analysts to ensure engagement with the project progress, changes, highlights and issues
  • Maintain the project Work Breakdown Structure such that it facilitates project execution and cost control during project execution and meets accounting requirements for asset creation and project closeout
  • Fulltime
Read More
Arrow Right
New

Mechanical Engineer - Energy Solutions

Join a Team of engineers dedicated to working hand-in-hand with large manufactur...
Location
Location
Canada , North York
Salary
Salary:
65.00 - 68.00 CAD / Hour
https://www.randstad.com Logo
Randstad
Expiration Date
May 09, 2026
Flip Icon
Requirements
Requirements
  • Engineering Degree preferred, Chemical or Mechanical Engineering preferred
  • Membership in Professional Engineers of Ontario or similar professional organization is preferred
  • Proven skills in: leading and influencing without explicit authority
  • time management
  • Ability to work independently but work within team of like-minded professionals
  • Valid driver’s license with a responsible driving record is needed
Job Responsibility
Job Responsibility
  • Identify new contacts and conduct at large manufacturing facilities to for the purpose to arrange site visits
  • Attend joint-site visits with team members to support in the identification and quantification of potential energy savings projects
  • Balance multiple priorities: Able to effectively manage time and priorities, consistently delivering in firm annual savings targets
  • Quantify impact and secure buy-in: Build technical savings calculations, sometimes from scratch, to support project justification and persuade key stakeholders on execution of work
  • Provide solutions to complex problems: expertly analyze complex operations across various industries and synthesize available information to create solutions equally appealing to business and technical people
  • Forge long-term customer relationships: build and nurture professional relationships founded on unwavering trust and mutual respect, being a first-choice energy efficiency partner for your customers
  • Continuous growth and curious mindset: Proactively identify new savings opportunities to drive both short and long-term work and build a sales funnel for sustained growth
  • Drive results autonomously while thriving in a collaborative environment: Play a supporting role in managing a small group customer base and integrate into, and support broader Team to achieve personal and collective objectives
What we offer
What we offer
  • Hybrid Work Model: in-Office (Monday, Tuesday & Thursday) Remote (Wednesday & Friday)
Read More
Arrow Right