This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join Walmart and your work could help over 275 million global customers live better every week. Yes, we are the Fortune #1 company. But you’ll quickly find we’re a company who wants you to feel comfortable bringing your whole self to work. A career at Walmart is where the world’s most complex challenges meet a kinder way of life. Our mission spreads far beyond the walls of our stores. Join us and you'll discover why we are a world leader in diversity and inclusion, sustainability, and community involvement. From day one, you’ll be empowered and equipped to do the best work of your life.
Job Responsibility:
Design Multi-Modal Evaluation Frameworks: Develop and validate novel evaluation metrics for non-deterministic outputs, specifically video, image, for 3D assets, and audio.
Build 'AI-as-a-Judge' Systems: Fine-tune Vision-Language Models (VLMs) and Reward Models to serve as automated evaluators, creating scalable proxies for human judgment.
Lead Experimentation & Causal Inference: Design and analyze A/B tests to measure the downstream business impact of GenAI content
apply causal inference techniques to understand how specific asset attributes drive user engagement.
Orchestrate Human-in-the-Loop (RLHF) Strategy: Define protocols for human evaluation, managing the relationship with annotation partners to create high-quality 'Golden Sets' for benchmarking and Reinforcement Learning from Human Feedback (RLHF).
Strategic Cross-Functional Partnership: Collaborate with ML Engineers and Product Managers to establish 'Go/No-Go' model launch criteria based on latency, safety, and perceptual quality standards.
Research & Innovation: Stay current with state-of-the-art research in perceptual quality (e.g., FID, CLIP scores, VQA) and implement advanced techniques to detect hallucinations, artifacts, or bias in generated content.
Requirements:
Master's degree in Computer Science with a specialization in Computer Vision, Machine Learning, or equivalent practical experience.
3+ years of experience with machine learning algorithms and tools.
Strong foundation in statistical analysis, experimental design (A/B testing), and causal inference.
Hands-on experience with Generative AI evaluation (e.g., using LLMs/VLMs for evaluation, computing FID/IS/CLIP scores, or designing perceptual studies).
Proficiency in Python and deep learning frameworks (PyTorch, TensorFlow) for analyzing model outputs and building evaluation pipelines.
Experience processing unstructured data (image, video, 3D meshes) for analytical purposes.
Nice to have:
PhD in Machine Learning, Computer Science, or a related technical field.
Experience designing Reward Models for RLHF pipelines.
Deep understanding of 3D geometry processing (meshes, point clouds) and how to mathematically quantify '3D quality' (e.g., mesh manifoldness, texture resolution).
Experience with Crowdsourcing platforms and designing instructions for subjective human evaluation.
Publication record or practical experience in Computational Photography, Computer Vision Quality Assessment, or Psychophysics.
Experience with Big Data tools (Spark, SQL, BigQuery) for analyzing large-scale experiment results.