CrawlJobs Logo

Multimodal Speech Engineer

1x.tech Logo

1X Technologies

Location Icon

Location:
United States , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

150000.00 - 250000.00 USD / Year

Job Description:

The AI Companion team creates the speech interface for NEO, as well as the physical awareness behaviors that evokes trust, warmth, and competence when NEO interacts with people. As a Multimodal Speech Engineer on the AI Companion Team, you will lead the effort to create a conversational speech model, from design to data collection to deployment. You will develop real-time architectures that enable NEO to not only converse with users, but also incorporate other modalities like vision, spatial audio, and body language. You will work closely with the design team to reflect NEO’s personality and 1X’s brand values in the way NEO speaks and responds to users, and the autonomy team to ensure that NEO’s speech models are aware of its own physical capabilities.

Job Responsibility:

  • Design and implement data pipelines for large scale speech interactions from NEO data and external datasets
  • Train speech2speech models to be aware of NEO’s embodiment
  • Design appropriate responses for a variety of user queries
  • Synchronize speech with body language
  • Customize NEO with different personalities

Requirements:

  • 3+ years of experience in speech and audio modeling domains
  • Experience in multi-modal conversational models (language, audio, vision) is a strong plus
  • Ability to take open-ended problems in conversation models, come up with creative solutions, implement proof-of-concepts, and translate those to production.

Nice to have:

Experience in multi-modal conversational models (language, audio, vision)

What we offer:
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays

Additional Information:

Job Posted:
December 01, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Multimodal Speech Engineer

Multimodal Speech Engineer, AI Companion

As a Multimodal Speech Engineer on the AI Companion Team, you will lead the deve...
Location
Location
United States , Palo Alto
Salary
Salary:
150000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in speech and audio modeling domains
  • Experience with multi-modal conversational models (language, audio, vision)
  • Ability to take open-ended problems in conversation modeling, develop creative solutions, build proof-of-concepts, and scale them to production
Job Responsibility
Job Responsibility
  • Design and implement data pipelines for large-scale speech interactions using internal and external datasets
  • Train speech-to-speech models that incorporate awareness of NEO’s physical form
  • Create dynamic responses for a wide range of user queries
  • Synchronize NEO’s speech with physical gestures and body language
  • Customize NEO’s speech behavior to reflect different personalities
What we offer
What we offer
  • Equity
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

Engineering Manager, Multimodal (API)

OpenAI is seeking an Engineering Manager to lead our multimodal API product suit...
Location
Location
United States , San Francisco
Salary
Salary:
293000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience managing engineering teams that deliver complex, high-quality products at scale
  • Strong technical background and proficiency in modern software engineering practices and system architecture
  • Excellent collaboration and communication skills to effectively coordinate across diverse teams and stakeholders
  • Familiarity with or strong interest in multimodal AI, including speech technologies, real-time systems, and image generation
  • Ability to operate effectively in a fast-paced, ambiguous startup environment
Job Responsibility
Job Responsibility
  • Build, mentor, and grow a high-performing engineering team focused on multimodal API products
  • Collaborate closely with product managers, designers, and other stakeholders to define the strategic vision and product roadmap
  • Work closely with our research teams to improve our core multimodal models for API customer use cases
  • Guide technical and architectural decisions, emphasizing scalability, robustness, and user experience
  • Foster a culture of innovation, continuous improvement, and accountability within your team
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

AI Engineer - Speech & NLP

Interhuman AI is building the next generation of social intelligence infrastruct...
Location
Location
Denmark , København
Salary
Salary:
55000.00 - 65000.00 DKK / Year
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Machine Learning, Computer Science, or a related field with a focus on speech processing and/or NLP
  • Track record of building and shipping models
  • Strong proficiency in Python and deep experience with PyTorch (or JAX/TensorFlow)
  • Familiarity with the current landscape of speech and multimodal models (e.g., Whisper, audio-LLMs, speech encoders, vision-language models)
  • You thrive with ambiguity. You can scope your own work, prioritize ruthlessly, and know when to ask for input
  • Clear communicator—you can explain a complex architecture to both engineers and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Design, train, and iterate on speech and language models that extract social and emotional signals from live conversation
  • Own the full model development lifecycle—from data curation and architecture design through training, evaluation, and production deployment
  • Build evaluation frameworks and benchmarks that capture the subtleties of human interaction that standard metrics miss
  • Stay at the frontier of multimodal research and translate relevant advances into our production stack
  • Collaborate closely with engineering to ensure models meet real-time latency and scalability requirements
What we offer
What we offer
  • Competitive salary and meaningful equity in an early-stage, venture-backed company
  • Direct influence on technical direction—your work shapes the product, not just a feature
  • A small, focused team where your contributions are visible and impactful from day one
  • Flexibility on location and working arrangements
  • Fulltime
Read More
Arrow Right

Research Intern - GenAI

Appen is seeking Research Interns to support innovative research in Generative A...
Location
Location
Australia , Chatswood, Sydney
Salary
Salary:
Not provided
appen.com Logo
Appen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Postgraduate students in Linguistics, Computer Science, AI, Data Science, or similar disciplines preferred
  • strong final-year and recent undergraduate candidates in these fields will also be considered
  • Familiarity with programming languages such as Python, R, or similar tools used in data analysis and machine learning
  • Experience with data annotation, model evaluation, or prompt engineering
  • Understanding of multilingual NLP, speech technologies, or agentic AI systems
  • Strong written communication skills, especially for summarizing research and drafting technical content
  • Ability to work independently and collaboratively in a remote research environment
Job Responsibility
Job Responsibility
  • Conduct literature reviews on topics such as adversarial prompting, multilingual evaluation, and agentic AI
  • Assist in dataset curation, annotation, and quality assurance for speech, text, and multimodal data
  • Support model evaluation experiments, including prompt engineering and red teaming
  • Develop scripts and tools for data analysis, visualization, and automation
  • Contribute to internal documentation, research reports, and thought leadership content
  • Participate in team meetings and cross-functional collaborations
  • Help prepare materials for conferences, publications, and workshops
What we offer
What we offer
  • Hands-on experience in applied AI research with real-world impact
  • Mentorship from experienced researchers and exposure to industry workflows
  • Opportunities to contribute to publications, datasets, and thought leadership
  • A collaborative and inclusive research environment
Read More
Arrow Right

Senior Member of Technical Staff, Multimodal AI

At Cohere, we believe in the power of multimodal AI to revolutionise the way we ...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Exceptional software engineering skills with a proven track record of building robust and scalable systems
  • Strong command of Python and well-versed in popular deep learning frameworks like JAX, PyTorch, and TensorFlow, with an understanding of their multimodal capabilities
  • Knowledge of distributed training strategies, especially for large-scale multimodal models
  • Familiarity with autoregressive models, particularly their application in multimodal tasks such as image or video captioning, speech-to-text generation
Job Responsibility
Job Responsibility
  • Design and develop cutting-edge multimodal AI systems, integrating various modalities such as text, speech, and vision
  • Conduct research and experiments on our advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and more
  • Collaborate closely with our world-class teams, learning from and contributing to their expertise in the field
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Director, Digital Ecosystem Applications

This position is responsible for the Software Platforms group at the Innovation ...
Location
Location
United States , Belmont
Salary
Salary:
240000.00 - 285000.00 USD / Year
https://www.volkswagen-group.com Logo
Volkswagen AG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years with 2+ years in a technical leadership role
  • CS, EE, M.S. Engineering (or equivalent) REQUIRED
  • M.S. Engineering (or equivalent) or PhD PREFERRED
  • Analytical and conceptual thinking – using logic and reason, creative and strategic
  • Communication skills – interpersonal, presentation and written
  • Managing interdisciplinary teams on individual projects
  • Integration – joining people, processes or systems
  • Influencing and negotiation skills
  • Problem solving
  • Resource management
Job Responsibility
Job Responsibility
  • Define the technical mission, architecture strategy, and long‑term platform vision for the In‑Vehicle Computing & Digital Ecosystem Applications team, spanning Android Automotive OS (AAOS), in‑vehicle compute platforms, Software‑Defined Vehicle (SDV) architecture, and AI‑driven cockpit intelligence
  • Provide technical leadership across the full software stack, including Android Framework, System Services, HAL layers, middleware, connectivity stacks, media/audio frameworks, HMI toolchains, and cloud‑connected AI runtimes within an SDV‑aligned architecture
  • Lead and mentor engineering teams in platform bring‑up, system integration, performance optimization, and development of AI‑agentic features, multimodal interaction models, and next‑generation speech technologies
  • Manage multi‑year budgets for platform development, AI integration, SDV‑aligned compute evolution, SoC evaluations, cloud services, and prototype programs
  • Deliver executive‑level technical reporting on architecture decisions, platform readiness, SDV integration milestones, AI progress, risks, and strategic recommendations
  • Drive strategic planning for ICC’s infotainment and cockpit portfolio, including AAOS evolution, hybrid cloud/edge AI pipelines, intelligent mobile agent technologies, and SDV‑centric software and compute roadmaps
  • Align technical roadmaps with global VW Group Innovation teams across infotainment, connectivity, AI/ML, vehicle architecture, cloud services, and SDV platform strategy, ensuring cross‑platform consistency and shared component reuse
  • Build strategic relationships with SoC vendors, Tier‑1 suppliers, cloud providers, and AI technology partners to influence cockpit compute and SDV platform evolution
  • Maintain partnerships with Silicon Valley companies specializing in AI runtimes, LLMs, speech, multimodal interaction, and automotive‑grade SDV‑compatible software frameworks
  • Collaborate with academic and research institutions on AI‑agentic systems, embedded ML, HMI, and in‑vehicle compute architectures aligned with SDV principles
What we offer
What we offer
  • Eligibility for annual performance bonus
  • Healthcare benefits
  • 401(k), with company match
  • Defined contribution retirement program
  • Tuition reimbursement
  • Company lease car program
  • Paid time off
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Product

Frontier Foundry (F²) is IDC’s bold new innovation engine — a design-led, full-s...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 7+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Hands-on experience with Nuance voice technologies or similar platforms (e.g., Azure Speech, Dialogflow, Alexa Skills Kit)
  • Deep understanding of Voice Access systems, accessibility APIs, and assistive technologies
  • Strong proficiency in full-stack development, especially client-side application engineering and user-facing experiences
  • Experience with GitHub Copilot, Copilot Studio, AI Foundry, or equivalent vibe coding/generative AI tools
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, build, and deliver high-quality software components aligned to F² charters: interaction models (inking, stylus, display tech), multimodal innovation (sensor fusion, voice/touch interfaces), or AI agents (context-aware, task-oriented)
  • Integrate and optimize Nuance Conversational AI technologies (e.g., speech-to-text, text-to-speech, NLU) into multimodal experiences
  • Enhance Voice Access capabilities across platforms, ensuring accessibility, responsiveness, and seamless user interaction
  • Work across the stack — from UI to backend — with a bias for impact and iteration
  • Embrace “vibe coding” using AI-assisted tools like GitHub Copilot, Copilot Studio, AI Foundry, and other generative AI tools to reduce boilerplate and drive intelligent test automation
  • Collaborate with product, design, and partner teams to shape backlog priorities and deliver intuitive, high-impact experiences
  • Navigate evolving priorities with ingenuity, turning loosely defined ideas into tangible software outcomes
  • Contribute to architecture discussions, code reviews, and prototyping efforts
  • Foster a culture of agility, experimentation, and outcome-driven development
  • Fulltime
Read More
Arrow Right
New

Lead Applied ML Engineer

Lead Applied ML Engineer, Technology and Digital, FT, 09A-5:30P
Location
Location
United States , Remote
Salary
Salary:
144000.00 - 186000.00 USD / Year
baptisthealth.net Logo
Baptist Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Masters degree in Computer Science/Machine Learning or a minimum of 10 years equivalent professional experience
  • Must have experience in GCP
  • Proven team leadership background in machine learning and artificial intelligence with expertise in one or more of: computer vision, NLP, speech, optimization, deep learning, reinforcement learning, time series, generative models, signals, and distributed systems
  • Strong proficiency in ML modeling frameworks
  • Strong expertise in overall software development approach
  • Significant leadership experience in building end to end data systems
  • Advanced software engineering skills with proven experience crafting, prototyping, and delivering advanced algorithmic solutions
  • Proficiency in one or multiple machine learning languages (ex: Python) & development environments such as AWS Sagemaker
  • Minimum Required Experience: 10 years
Job Responsibility
Job Responsibility
  • Lead AI Implementation: Drive the end-to-end development of production-grade AI solutions, from LLM orchestration and backend APIs to interactive UI prototypes and automated deployment pipelines
  • Full-Stack Ownership: Take accountability for the technical lifecycle of AI products, ensuring they are scalable, secure, and seamlessly integrated into healthcare workflows
  • GenAI & Advanced Modeling: Develop and deploy advanced Generative AI applications using RAG patterns and model fine-tuning
  • architect orchestration layers and agentic workflows to ensure vendor-agnostic, autonomous problem-solving
  • Full-Stack Development & Prototyping: Build robust Python-based backends and scalable APIs
  • create interactive user interfaces (POCs) to visualize AI reasoning and gather clinical stakeholder feedback
  • Data & Infrastructure Integration: Integrate AI solutions with cloud data warehouses (e.g., Snowflake) and manage containerized deployments (Docker) via automated CI/CD and GitOps pipelines (GitLab, ArgoCD) on GCP
  • Governance, Security, & Monitoring: Engineer automated guardrails for PII/PHI masking and risk mitigation
  • implement observability tools to monitor model drift, hallucination rates, and token-based cost metrics (FinOps)
  • Safety & Interoperability: Validate clinical logic using advanced evaluation frameworks (e.g., RAGAS) and ensure seamless EHR integration through healthcare data standards like FHIR and HL7
  • Fulltime
Read More
Arrow Right