This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Develop, evaluate, and iterate on NLP and LLM-based systems, including text classification, information extraction, and context retrieval pipelines
Build measurement and evaluation frameworks — both offline and online — to assess where and why systems are underperforming and quantify the impact of improvements
Develop golden test datasets and define methodologies for creating and maintaining them over time, including designing annotation guidelines and ensuring label quality
Evaluate and apply the appropriate approach for language tasks — whether prompt engineering, fine-tuning, or classical NLP methods — including modern retrieval and RAG architectures and LLM evaluation methodologies, based on the problem and available data
Perform structured analysis of system performance to surface failure modes, data gaps, and high-value areas for investment, applying sound statistical reasoning to evaluation results
Partner with engineers to support deployment, integration, and monitoring of ML and AI systems in production
Contribute to standards and best practices around deploying, evaluating, and monitoring text and language-based ML systems
Document work clearly and maintain knowledge artifacts that make systems understandable and maintainable over time
Collaborate with senior data scientists and cross-functional partners to translate business needs into well-scoped technical solutions, including communicating findings and recommendations to non-technical stakeholders
Requirements
Bachelor’s or Master’s degree in a quantitative field (computer science, statistics, linguistics, or related) and 2–4 years of applied ML or data science experience, or equivalent practical experience
Hands-on experience building or improving NLP or LLM-based systems in applied settings
Familiarity with text classification, information extraction, or other NLP tasks — and an understanding of where these systems fail
Experience with both prompt engineering and fine-tuning approaches for language tasks, with the judgment to know when to apply each
Familiarity with modern retrieval strategies and RAG architectures and how they affect LLM system performance
Experience with Hugging Face Transformers for text classification or related NLP tasks
Experience contributing to evaluation frameworks, test sets, or performance diagnostics for ML systems, including comfort with statistical methods for measuring model performance
Proficiency in Python and SQL, and comfort working with structured and unstructured data
Ability to operate effectively in ambiguous problem spaces — scoping technical approaches when requirements are not fully defined
Strong written communication skills
able to document systems and findings clearly and present recommendations to non-technical stakeholders
Nice to have
Experience designing data annotation workflows, labeling guidelines, or label quality processes is a plus
Experience with model deployment, monitoring, or production ML workflows is a plus
Familiarity with LangChain and LangSmith or similar LLM orchestration and observability tooling is a plus
Transportation or logistics industry experience is a plus
What we offer
Medical, dental, vision, life, disability, and supplemental coverage
Matching 401(k) program
Employee Resource Groups
Office wide engagement activities, team events, happy hours
Casual dress code
Work in downtown Chicago, IL
LifeStart gym with Peloton bikes and personal training
Free counseling sessions through Employee Assistance Program
Company paid holidays, paid vacation time and wellness days