This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
LMArena is looking for a Machine Learning Scientist to lead our open-source research, including open data set and code releases, advancing how the world evaluates and understands AI models in the open. You’ll design, run, and share new methods and experiments that reveal what makes models useful, trustworthy, and capable, grounded in human preference signals and released openly for the full ecosystem and research community to build upon.
Job Responsibility:
Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions
Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks
Analyze large-scale human voting and interaction data to uncover insights into model performance and user preferences
Communicate results with the broader research community via academic papers, educational content, conference talks
Collaborate with engineers to implement and scale research findings into production systems
Prototype and test research ideas rapidly, balancing rigor with iteration speed
Partner with model providers to shape evaluation questions and support responsible model testing
Contribute to the scientific integrity and transparency of the LMArena leaderboard and tools
Requirements:
PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field
Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback)
Proficiency in Python and ML research libraries such as PyTorch, JAX, or TensorFlow
Demonstrated ability to design and analyze experiments with statistical rigor
Experience publishing research or working on open-source projects in ML, NLP, or AI evaluation
Comfortable working with real-world usage data and designing metrics beyond standard benchmarks
Ability to translate research questions into practical systems and collaborate across engineering and product teams
Passion for open science, reproducibility, and community-driven research
Nice to have:
Skilled at public speaking, writing, and presenting research work to diverse audiences
Actively participates in conferences, panels, and online forums to foster relationships and thought leadership
Builds trust through transparent communication and consistent community engagement
Serves as a go-to contact for external researchers, journalists, and partners
What we offer:
Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs
The opportunity to work on cutting-edge AI with a small, mission-driven team
A culture that values transparency, trust, and community impact
Competitive compensation and equity aligned to the markets where our team members are based