CrawlJobs Logo

Reinforcement Learning Engineer

wiremind.io Logo

Wiremind

Location Icon

Location:
France , Paris

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

At Wiremind, the Data Science team is responsible for the development, monitoring and evolution of all ML-powered forecasting and optimization algorithms in use in our Revenue Management systems. Our algorithms are divided in 2 parts: A modelling of the demand using ML models (e.g. deep learning, boosted trees) trained on historical data in the form of time-series; Constrained optimizations problems solved using linear programming techniques. The team is shaped to have all profiles necessary to constitute an autonomous department (DevOps, software and data engineering, data science, AIML, operational research) and works on a modern technical stack composed of argo-workflow (pipelines orchestrator), MLFlow (models & experiments tracking) and in-house python packages. Recently, we have begun exploring new ways of solving our revenue optimization problems using Reinforcement Learning techniques instead of linear programming.

Job Responsibility:

  • Exchanging on a daily basis with the data, ML and product teams to perfect your business comprehension
  • Proposing new ideas to solve revenue optimization problems
  • Implementing, testing and evaluating these ideas in a controlled environment

Requirements:

  • Pursuing a Master’s Degree in Engineering, Data Science, Applied Mathematics or a similar field
  • Prior knowledge of usual Machine Learning techniques and good practices
  • Looking for an end-of-study internship
  • Passionate about addressing business challenges through innovative technological solutions
  • Committed to maintaining high-quality standards in all aspects of your work

Nice to have:

  • A first experience, internship or school project in Reinforcement Learning
  • Knowledge of Reinforcement Learning theory
What we offer:
  • Beautiful 800 m² offices in the heart of Paris (Bd Poissonnière)
  • Attractive remuneration
  • A caring and stimulating team that encourages skills development through initiative and autonomy
  • A learning environment with opportunities for evolution
  • 1 day of remote work per week
  • A great company culture (monthly afterworks, regular meetings on technology and products, annual off-site seminars, team-building…)

Additional Information:

Job Posted:
January 05, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Reinforcement Learning Engineer

AI Research Engineer - Reinforcement Learning

At Helsing we deliver AI-based capabilities and the enabling infrastructure that...
Location
Location
Germany , Munich
Salary
Salary:
Not provided
helsing.ai Logo
Helsing
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hold MSc in machine learning with a speciality in either reinforcement learning, multi-agent systems, automation and control, or robotics
  • Have excellent communication skills and the ability to report and present research findings clearly and efficiently both internally and externally
  • Are passionate about keeping up-to-date with current research and enjoy reimplementing / extending papers on state-of-the-art Deep Learning-based approaches
  • Possess solid software engineering skills, writing clean and well-structured code in Python and/or languages like Rust, Java, or modern C++, and experience deploying AI software to production including testing, QA, and monitoring
Job Responsibility
Job Responsibility
  • Design, train and deploy agents in complex multi-agent environments
  • Contribute to our reinforcement learning stack by implementing, improving and extending the current state of the art in multi-agent reinforcement learning
  • Be a part of impactful projects and will collaborate with people across several teams and backgrounds to integrate cutting edge ML/AI in our production systems
What we offer
What we offer
  • Competitive compensation and stock options
  • Relocation support
  • Social and education allowances
  • Regular company events and all-hands to bring together employees as one team across Europe
  • A hands-on onboarding program (affectionately labelled “AI-duction”), in which you will be familiarising yourself with our tools and ML pipelines used across the company
  • Fulltime
Read More
Arrow Right

Senior Reinforcement Learning Engineer

Figure is an AI Robotics company developing a general purpose humanoid. Our Huma...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 400000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Confident writing production quality code in PyTorch
  • Familiar with online and offline reinforcement learning algorithms: PPO, SAC, etc.
  • Experience tuning hyperparameters and cost functions for these RL algorithms
  • Familiarity with common RL techniques such as: domain randomization, curriculum learning, reward shaping, etc.
  • Familiarity with general ML evaluation tools such as TensorBoard, Weights&Biases, etc.
  • Strong mix of industry and research experience, ideally 5-7+ years of experience
Job Responsibility
Job Responsibility
  • Develop, train, and deploy reinforcement learning algorithms for locomotion and manipulation tasks
  • Build simulation infrastructure to support the training of locomotion and manipulation policies for a general purpose humanoid robot at a large scale
  • Collaborate with the controls team to integrate policies into the existing control stack
  • Define, test, and evaluate performance metrics for learned policies
  • Fulltime
Read More
Arrow Right

Research Engineer, Reinforcement Learning

As a Research Engineer specializing in Reinforcement Learning, you will be respo...
Location
Location
United States , Palo Alto
Salary
Salary:
180000.00 - 250000.00 USD / Year
1x.tech Logo
1X Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming experience in Python and/or C++
  • Proficiency with PyTorch
  • Hands-on experience with simulation platforms like Isaac Sim or MuJoCo
  • Experience training reinforcement learning policies, particularly for manipulation or locomotion
  • Ability to collaborate cross-functionally with hardware, control, data, and QA teams
  • Demonstrated experience addressing the sim-to-real gap
Job Responsibility
Job Responsibility
  • Own the full stack of engineering tasks: from data engineering and model architecture to delivering polished products
  • Train NEO on a wide variety of manipulation and locomotion tasks
  • Collaborate with hardware teams to bridge the sim-to-real gap for policies trained in simulation
  • Partner with controls, quality assurance, and data collection teams to ship RL policies to production
  • Deploy reinforcement learning-trained skills into real-world home environments
What we offer
What we offer
  • Health, dental, and vision insurance
  • 401(k) with company match
  • Paid time off and holidays
  • Fulltime
Read More
Arrow Right

PhD Autonomy Engineer Intern - Planning & Controls (Reinforcement Learning)

Skydio builds the world’s most advanced autonomous drones used across inspection...
Location
Location
Switzerland , Zurich
Salary
Salary:
50.00 EUR / Hour
skydio.com Logo
Skydio
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD student in Robotics, Machine Learning, Controls, or related field
  • Strong fundamentals in RL, control theory, and motion planning
  • comfort with safety/robustness concepts
  • Proficient in Python (PyTorch/JAX/Ray RLlib) and at least one of C++ or CUDA
  • Hands-on experience with robotics simulation (Isaac Lab/MuJoCo/PyBullet) and sim2real techniques
  • Experience training/deploying policies for navigation, manipulation, or locomotion on real robots or autonomous vehicles
Job Responsibility
Job Responsibility
  • Develop and deploy reinforcement learning (and adjacent policy-learning methods) that make Skydio aircraft plan, navigate, and control themselves more intelligently—safely, reliably, and efficiently—across our ecosystem: handheld apps, ground control, cloud autonomy services, and fleet workflows
  • Navigation & avoidance in the wild: Train policies that adapt online to cluttered 3D scenes (forests, bridges, urban canyons), complementing our geometric stack for robust obstacle avoidance and dynamic goal-seeking
  • RL-augmented planning: Fuse learned cost shaping / value functions with trajectory optimization for smooth, agile flight with tight safety envelopes and mission constraints
  • Sim → Real at scale: Build scalable datasets and training loops with Isaac Lab, domain randomization, residual learning, and safety filters
  • validate on real drones weekly
  • Human-in-the-loop shared control: Learn assistive policies that blend pilot intent, autonomy priors, and uncertainty-aware behaviors for intuitive control handoffs
  • Fleet & multi-agent: Explore decentralized coordination for coverage, pursuit, and collaborative mapping with minimal comms
Read More
Arrow Right

Applied Research Lead, Reinforcement Learning

We are building AI to simulate the world through merging art and science. We bel...
Location
Location
United States
Salary
Salary:
280000.00 - 380000.00 USD / Year
runwayml.com Logo
Runway
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of relevant engineering or research experience in applying reinforcement learning to align language, image, and/or video generation models
  • Very strong programming skills and ability to write clean and maintainable research code
  • Deep interest in building human-in-the-loop systems for creativity
  • Passion for seeing research through from initial conception to eventual application
  • Experience mentoring and teaching other researchers
  • Strong communication, collaboration, and documentation skills
Job Responsibility
Job Responsibility
  • Lead efforts in applying reinforcement learning based techniques to improve the quality and controllability of the models that power Runway’s research and tools
  • Fulltime
Read More
Arrow Right

Associate Director, Reinforcement Learning (ML)

Lead Amgen’s strategy and execution for Reinforcement Learning from Human Feedba...
Location
Location
United States , Thousand Oaks; Jacksonville
Salary
Salary:
Not provided
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate degree and 3 years of Computer Science, IT or related field experience
  • Master’s degree and 5 years of Computer Science, IT or related field experience
  • Bachelor’s degree and 7 years of Computer Science, IT or related field experience
  • Associate’s degree and 12 years of Computer Science, IT or related field experience
  • High school diploma / GED and 14 years of Computer Science, IT or related field experience
  • Deep, hands-on expertise in Reinforcement Learning from Human Feedback (RLHF) and/or advanced reinforcement learning, including reward modeling, policy optimization, exploration strategies, and offline/online evaluation
  • Demonstrated experience deploying RLHF or RL systems into production for real-world applications (e.g., large language models, recommendation systems, decision support tools, or workflow automation), ideally in healthcare, life sciences, or other regulated domains
  • Strong background in modern machine learning and deep learning, with practical experience in Python and frameworks such as PyTorch or TensorFlow, and familiarity with LLM ecosystems and tooling
  • Experience driving sophisticated, cross-functional initiatives, collaborating with non-technical stakeholders (e.g., physicians, scientists, commercial leaders, compliance, legal) and translating needs into impactful AI solutions
  • Strong ability to communicate complex technical topics simply, tailoring content to senior executives and non-technical audiences
Job Responsibility
Job Responsibility
  • Lead the design and development of RLHF systems including reward modeling, policy optimization, safety and alignment mechanisms, and evaluation frameworks for large language models and other AI systems
  • Drive hands-on technical execution, particularly for high-impact projects, reviewing architectures, experimentation plans, and code, and helping the team navigate scientific and engineering trade-offs
  • Establish best-practice pipelines for human feedback, partnering closely with internal customer teams to define feedback protocols, annotation quality standards, and governance for RLHF data
  • Define and track success metrics for RLHF systems, balancing offline and online evaluation, A/B tests, safety and robustness criteria, and business or scientific outcomes
  • Collaborate across Amgen leaders to ensure RLHF solutions are aligned with strategy, compliant with policy, and integrated into real workflows
  • Partner with Data, Platform and Technology teams to ensure that RLHF workloads are supported by scalable data platforms, model hosting, experimentation infrastructure, and MLOps best practices
  • Champion responsible and compliant AI, working with Legal, Compliance, and Information Security to implement governance around human feedback, data usage, model behavior, transparency, and risk management in a regulated environment
  • Communicate insights and influence senior stakeholders, creating clear narratives, roadmaps, and recommendations that help executives understand RLHF trade-offs, risks, and opportunities
What we offer
What we offer
  • A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts
  • A discretionary annual bonus program, or for field sales representatives, a sales-based incentive plan
  • Stock-based long-term incentives
  • Award-winning time-off plans
  • Flexible work models where possible
Read More
Arrow Right

Bike sharing system rebalancing by reinforcement learning algorithms

This internship project focuses on a specific component of a broader initiative ...
Location
Location
France , Lyon
Salary
Salary:
Not provided
abg.asso.fr Logo
ABG - Association Bernard Gregory
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s student (M2) or in the final year of an Engineering School program
  • Background in Computational Mechanics, Applied Mathematics, or Data Science
  • Knowledge and experience in numerical modeling and simulation of physical or dynamical systems
  • Knowledge and experience in machine learning or statistical data analysis
  • Knowledge and experience in time series forecasting and spatio-temporal modeling
  • Knowledge and experience in optimization and/or reinforcement learning methods
  • Programming skills in Python (preferred), including libraries such as NumPy, Pandas, PyTorch, or TensorFlow
  • Data visualization and exploratory data analysis
  • Familiarity with version control tools (e.g., Git) and collaborative coding practices
  • Good written and oral communication skills in English
Job Responsibility
Job Responsibility
  • Develop predictive models to estimate short-term bicycle availability and demand at both the station and network levels using spatio-temporal data
  • Analyze and preprocess heterogeneous datasets, including trip records, station metadata, weather conditions, and temporal factors, to create robust inputs for modeling
  • Implement and compare different machine learning approaches (e.g., time series forecasting, graph neural networks, spatio-temporal models) to capture flow dynamics in the bikeshare system
  • Evaluate the performance and scalability of predictive algorithms under realistic conditions, using metrics relevant to operational decision-making in mobility systems
  • Provide data-driven inputs for the reinforcement learning module, enabling the development of adaptive and real-time rebalancing strategies in the second phase of the project
  • Integrate uncertainty quantification to assess the confidence of predictions and their impact on rebalancing decisions
  • Explore online or incremental learning techniques to enable continuous model adaptation as new data streams become available
Read More
Arrow Right

Reinforcement learning intern

As a Reinforcement Learning Intern, you will help develop and implement learning...
Location
Location
France , Paris
Salary
Salary:
Not provided
enchanted.tools Logo
Enchanted Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BSc holder in Robotics, Engineering, Computer Science, or related field
  • Coursework or project experience in reinforcement learning or learning-based control
  • Strong Python skills and knowledge of a deep learning framework PyTorch, JAX, or TensorFlow
  • Familiarity with simulation environments such as Isaac Sim, Mujoco, or Gazebo
  • Solid analytical and problem-solving abilities
Job Responsibility
Job Responsibility
  • Develop, debug, and test reinforcement learning algorithms for locomotion and navigation on a dynamically balancing base
  • Extend simulation environments (Isaac Sim / Isaac Lab) to support training and evaluation of RL policies
  • Integrate trained policies into the Mirokai software stack and validate them on physical robots
  • Analyze performance, stability, and sim-to-real transfer aspects
  • Stay up to date with recent research in reinforcement learning for robotics
Read More
Arrow Right