This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Plaud is building the future of AI-native productivity — where meetings, conversations, and knowledge become an extension of your memory. As the first dedicated owner of Memory Evaluation, you will shape how AI understands, stores and recalls human knowledge for millions of users worldwide.
Job Responsibility:
Define how 'AI memory quality' should be measured for the industry
Turn product goals and real user scenarios into measurable evaluation criteria, evaluation frameworks and reproducible test cases
Build the global memory evaluation system from the ground up
Create evaluation standards, test pipelines, and datasets across markets (US, EU, Japan, China), working with real multilingual and multimodal data
Run continuous quality evaluations across Summary, Ask Plaud and other memory-enabled experiences
Become the source of truth for ‘is memory working?’
Benchmark against global AI meeting and knowledge products (Otter, Notion AI, NotebookLM, etc.) to extract best practices in memory accuracy, retrieval reliability, and user trust — and convert these insights into scientific evaluation methodology
Requirements:
Master's degree or above, preferably in Computer Science, Data Science, AI, or related fields
3+ years of product experience, with at least 1 year working on AI or LLM-related products or features
Solid analytical skills and familiarity with data processing (SQL/Python preferred)
Ability to communicate clearly in English and collaborate across product, engineering, and operations
Strong ownership, structured problem solving, and willingness to learn quickly
Understanding of English-speaking user scenarios, product expectations, or cultural differences
Nice to have:
Hands-on experience designing test plans, evaluation standards, or datasets for LLM or AI features
Familiarity with evaluation of memory-related behaviors (extraction, retrieval, contextual usage), or prior work on summary/Ask AI products
Experience reviewing real user data and translating findings into actionable product improvements