Audio Inference Engineer, Model Efficiency Job at Cohere (New York)

Senior Inference ML Runtime Engineer

The Inference ML Engineering team at Cerebras Systems is dedicated to enabling o...

Location

United States; Canada , Sunnyvale; Toronto

Salary:

Not provided

Cerebras Systems

Expiration Date

Until further notice

Requirements

Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Mathematics, or a related field
8+ years of experience in large-scale software engineering, with a focus on deep learning or related domains
Proficiency in Python for building and maintaining scalable systems
Advanced proficiency in C++, with an emphasis on multi-threaded programming, performance optimization, and system-level development
Demonstrated experience driving cross-functional projects
Experience building and scaling large-scale inference systems for LLMs or multimodal models
Familiarity with LLM serving frameworks, such as vLLM, SGLang, and TensorRT-LLM
Solid understanding of software architectural patterns for large-scale, high-performance applications
Hands-on experience with ML frameworks, such as PyTorch, and a strong understanding of their underlying architectures
Strong problem-solving skills, with the ability to balance technical depth with practical implementation constraints

Job Responsibility

Drive and provide technical guidance to a team of software engineers working on complex machine learning integration projects
Design and implement ML features (e.g., structured outputs, biased sampling, predicted outputs) that improve performance of generative AI models at inference time
Design and implement high-throughput, low-latency multimodal inference models that support delivery of image, audio, and video inputs and outputs
Maintain our scalable serving backend for handling many concurrent requests per minute
Scale our inference service by implementing detailed observability throughout the entire stack
Analyze and improve latency, throughput, memory usage, and compute efficiency on the service and the implementation of various features
Optimize software to accelerate generative LLM inference by achieving high throughput and low latency
Stay up-to-date with advancements in machine learning and deep learning, and apply state-of-the-art techniques to enhance our solutions
Evaluate trade-offs between different approaches, clearly articulate design choices, and develop detailed proposals for implementing new features
Uncover, scope, and prioritize significant areas of technical debt across the software stack to ensure continued high quality of the inference service

What we offer

Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs

Senior MLOps Engineer - Data Ingestion - Paris

We are looking for a Senior MLOps Engineer to join the Panda Team (Data & ML Ope...

Location

France , Paris

Salary:

Not provided

Doctolib

Expiration Date

Until further notice

Requirements

You have at least 7+ years as an MLOps Engineer or ML Platform Engineer with proven production model lifecycle management experience
You have expert-level experience with ML orchestration tools (MLflow, Braintrust, or similar) for batch processing and inference pipelines
You have a strong Site Reliability Engineering (SRE) foundation with focus on operations excellence, reliability, and observability
You have expertise in Python for automation and ML pipeline scripting
You have strong proficiency with infrastructure-as-code tools such as Terraform and container orchestration (Kubernetes)
You have experience with model evaluation frameworks and golden dataset management
You have a solid understanding of cloud infrastructure (preferably GCP, AWS, or Azure)
You have excellent problem-solving skills with focus on identifying and resolving infrastructure bottlenecks
You are fluent in English

Job Responsibility

Design and implement end-to-end ML model pipelines in production (LLM and custom models) with robust deployment, evaluation, and monitoring frameworks
Own data pseudo-anonymization architecture within ingestion services, converting Tier 0 (personal identifiers) to Tier 1 (anonymized data) while ensuring data quality and model performance
Build and maintain secure data export services with ML-based threat detection to prevent attack vectors (SQL injection, etc.) using adaptive models rather than manual rules
Manage golden datasets and implement production model evaluation frameworks to ensure anonymization quality and system reliability
Build and maintain data pipelines that efficiently extract, transform, and load data from various sources, handling multiple data formats (text, images, audio, video)
Implement automation and orchestration tools using ML orchestration platforms (MLflow, Braintrust, or similar) to streamline infrastructure provisioning and reduce manual effort
Monitor data and ML platforms for performance, reliability, and security
identify and troubleshoot issues proactively
Mentor team members on MLOps expertise and best practices to reduce knowledge silos and build organizational capability

What we offer

Free comprehensive health insurance for you and your children
25 days of paid vacation per year, plus up to 14 days of RTT
Free mental health and coaching services through our partner Moka.care
Work from abroad for up to 10 days per year thanks to our flexibility days policy
Lunch vouchers (Swile card) worth €8.50 per working day, with €4.50 covered by Doctolib
A subsidy from the work council to refund part of the membership to a sport club or a creative class
50% reimbursement of your public transport subscription
Parent Care Program: receive one additional month of leave on top of the legal parental leave
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Relocation support in case of international mobility

Fulltime

Senior Machine Learning Engineer, Speech Recognition (ASR)

We are on a mission to ensure everyone has access to medical expertise, no matte...

Location

Denmark , København

Salary:

Not provided

Life Science Talent

Expiration Date

Until further notice

Requirements

Strong programming skills in Python and the ability to contribute to production-grade codebases
Hands-on experience in speech recognition and ASR
Experience building ML systems that can be deployed and operated, including pipelines, CI and CD practices, and monitoring
Clear communication and collaboration skills across research, engineering, and product
A Master’s degree in computer science, engineering, mathematics, statistics, physics, or a related field, or equivalent professional experience

Job Responsibility

Train and fine-tune ASR models at scale, including dataset strategy, augmentation, and domain adaptation to real-world clinical audio
Build and improve validation and evaluation frameworks, including WER and targeted analysis across speakers, environments, devices, and clinical terminology
Deploy and operate ASR inference services with focus on reliability, latency, and efficiency in production
Optimize inference latency and throughput, including batching strategies, model export choices, and hardware-aware profiling
Build and maintain APIs and services in frameworks like FastAPI, Kafka, and NVIDIA Triton, and deploy and run them on Kubernetes
Take technical ownership of core ASR components, shaping best practices for modelling, evaluation, and production reliability across the team supporting the growth of engineers working on speech systems
Work closely with product and platform teams on safe rollouts, monitoring, and continuous improvement based on real-world feedback

What we offer

Equipment provided by Corti

Fulltime

Software Engineer 2

Microsoft Azure AI Inference platform is the next generation cloud business posi...

Location

United States , Redmond

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang, OR equivalent experience
Ability to meet Microsoft, customer, and/or government security screening requirements for this role
Technical background with a solid foundation in software engineering principles, distributed computing, and system architecture
Experience working on high-scale, reliable online systems
Experience with real-time online services requiring low latency and high throughput
Experience working with Layer 7 (L7) network proxies and gateways
Knowledge of network architecture and concepts, including HTTP and TCP protocols, authentication, and session management
Knowledge and experience with OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages
Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
Ability to independently lead projects

Job Responsibility

Design and implement core inference infrastructure for serving frontier AI models in production
Identify and drive improvements to end-to-end inference performance and efficiency of state-of-the-art LLMs and GenAI models from OpenAI, Anthropic and xAI hosted on AI Foundary
Design and implement efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload
Scale the platform to support the growing inferencing demand and maintain high availability
Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them
Drive generic features to cater to the needs of customers such as GitHub, M365, Microsoft AI and third-party companies
Collaborate with our partners both internal and external
Embody Microsoft's Culture and Values

Fulltime

Research Engineer, RealTime AI, MSL PAR

We are seeking research engineers to join the Product and Applied Research (PAR)...

Location

United States , Bellevue, WA

Salary:

257000.00 USD / Year ▼

Research Scientist Intern, Real-Time Multimodal AI

Reality Labs is building the future of connection through world-class AR/VR hard...

Location

United States , Burlingame

Salary:

7650.00 - 12134.00 USD / Month

Director, Partnership Marketing, UFC & ZB

TKO is seeking a Director of Partnership Marketing to join our Global Partnershi...

Location

United States , New York

Salary:

135000.00 - 180000.00 USD / Year

UFC

Expiration Date

Until further notice

Requirements

8+ years of experience in marketing strategy, consulting, commercial operations, or corporate development, ideally in sports, entertainment, or media
Demonstrated success in leading cross-functional strategic initiatives and driving organizational change
Deep understanding of brand partnerships, marketing strategy, commercialization, and client service models
Strong commercial acumen, with proven experience negotiating and renewing long-term partnerships
Exceptional executive communication and stakeholder management skills
Experience building and mentoring high-performing teams
Bachelor’s degree in Business, Marketing, Economics, or related field required
MBA or equivalent advanced degree strongly preferred

Job Responsibility

Lead the end-to-end management of commercial partnerships for UFC & Zuffa Boxing to deliver on partner’s brand and business goals
Lead strategic planning for all partners, identifying and prioritizing growth opportunities aligned with business objectives, while delivering consultative solutions leveraging the power of the TKO enterprise
Collaborate with key internal stakeholders (including Marketing, Events, Community Relations, Media, PR, Brand, Legal, etc.), serving as cross departmental project lead to ensure alignment of partner programs with UFC & Zuffa Boxing initiatives
Act as on-site partnership lead at WWE events, including oversight of experiential integrations and partner experiences
Build and develop relationships with brand partner contacts, including client hosting at UFC & Zuffa Boxing events
Drive revenue by leading all renewals, upsells, and cross-sells for UFC & Zuffa Boxing while unlocking new revenue opportunities across the TKO enterprise
Support VP in in developing a high-performing consultative, insight-driven strategic marketing team
Champion a culture of innovation, thought leadership, and proactive partner management within the Partnerships Marketing organization
Manage and mentor a high-performing team of strategists and cross-functional contributors, fostering a growth mindset and collaborative environment

What we offer

health care
retirement
vacation and other paid time off

Fulltime

Senior Platform Engineer

We're looking for ambitious Senior Platform or DevOps Engineers who thrive on so...

Location

United Kingdom , London

Salary:

75000.00 - 90000.00 GBP / Year

Linux Recruit

Expiration Date

Until further notice

Requirements

Cloud Experience: AWS alongside Azure and GCP as a bonus
Infrastructure as code: Terraform
CI/CD & automation: GitHub Actions, Jenkins or similar
Kubernetes & containers (EKS/AKS)
Observability & reliability: Prometheus, Grafana, OpenTelemetry
Platform Thinking: internal developer platforms, self servicing tool
Excellent communication and stakeholder engagement skills
Passion for learning new technologies and continuously developing skills

Job Responsibility

Designing, building, and maintaining scalable, reliable cloud platforms
Delivering infrastructure as code using AWS and Terraform
Supporting CI/CD pipelines and automated deployment processes
Working on cross-functional engineering projects within agile teams
Troubleshooting and resolving complex issues in high-availability production environments
Collaborating with developers, architects, and stakeholders to deliver modern digital services
Contributing to platform improvements, innovation, and engineering best practices
Helping shape the future of digital services within the UK Public Sector

Fulltime

Select Country

Audio Inference Engineer, Model Efficiency

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?