Continuous Time Reinforcement Learning. From Theory to Practice.

December 22, 2022
Offerd Salary:Negotiation
Working address:N/A
Contract Type:Other
Working Time:Negotigation
Working type:N/A
Job Ref.:N/A

2022-05495 - Continuous Time Reinforcement Learning. From Theory to Practice.

Level of qualifications required : Graduate degree or equivalent

Fonction : Internship Research

About the research centre or Inria department

The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, the Inria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.

For more than 10 years, the Inria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT).

  • Level: Master Level Research Internship (M2) or equivalent (stage fin étude ingénieur)
  • Where: Villeneuve d'Ascq
  • When: Fexible, somewhere 2022-2023, 4 months minimum
  • Financial support: about 500 euros/month (gratifications de stage)
  • Team: Scool
  • Advisers: Alena Shilova (, Philippe Preux (, Bruno Raffin (
  • Assignment Main activities

    Reinforcement Learning in the recent years has attracted a lot of attention. Deep RL managed to beat human or even expert performance in such tasks as atari games and GO. Unlike classical machine learning, RL helps to train an agent capable of taking decisions based on the state of the environment the agent is in. It is an attractive function that leads to multiple applications in numerous domains.

    One of the many applications of RL is control tasks. The simplest ones such as CartPole or Pendulum are the common testbeds for new RL algorithms, while more complex ones are of interest for robot learning. Those tasks are usually considered in the discrete time settings, which on the one hand simplifies the problem so that state-of-the-art RL algorithms can be applicable and on the other hand leads to suboptimal control related to the regularity of decision making process. Nevertheless, there are some problems for which it is necessary to be able to take decisions at the arbitrary moments of time or at high frequency, e.g. high frequency stock trading, autonomous driving and snowboard riding.

    Continuous Time Reinforcement Learning (CTRL), compared to Discrete Time Reinforcement Learning (DTRL), deals with the continuity of the problem. In this context, the dynamics of the system are expressed as a PDE (Partial Derivative Equation) for deterministic environments and SDE (Stochastic Derivative Equation) for stochastic environments. The value function (a useful measure to estimate the quality of a policy of actions) can be found from Hamiltonian-Jacobi-Bellman equation that replaces Bellman equation in discrete time. Despite promising performance on simple use cases 1,2,3,4,5, CTRL methods do not match the performance of DTRL algorithms in general case. There are several challenges that prevent CTRL from further scaling:

  • increased computational complexity of algorithms
  • Require a model for dynamics for training
  • exploration becomes even more difficult
  • But the emerging trend of SciML that intends to combine neural networks and PDE/SDE, like physics informed neural networks 6,7 or Neural ODEs 8 are bringing new tools to address CTRL.

    The objective of this internship is to develop a basic environment for CTRL with a few classical uses-cases (CartPole, Pendulum, Acrobot, Swimmer), test some promising strategies for CTRL and test some possible improvements.

  • Understanding and implementing different algorithms from CTRL 3,4,5
  • Testing them on the continuous time environments, such as CartPole, Pendulum, Acrobot, Swimmer
  • Improving existing strategies in different directions: learning of value function, exploration techniques, stability of the methods
  • References
  • Doya, Kenji. “Reinforcement learning in continuous time and space.” Neural computation 12.1 (2000): 219-245. https: //
  • Munos, Rémi. “A study of reinforcement learning in the continuous case by the means of viscosity solutions.” Machine Learning 40.3 (2000): 265-299. https: //
  • Yildiz, Cagatay, Markus Heinonen, and Harri Lähdesmäki. “Continuous-time model-based reinforcement learning.” International Conference on Machine Learning. PMLR, 2021. https://
  • Lutter, Michael, et al. “HJB optimal feedback control with deep differential value functions and action constraints.” Conference on Robot Learning. PMLR, 2020. https://
  • Lutter, Michael, et al. “Value iteration in continuous actions, states and time.” arXiv preprint arXiv:2105.04682 (2021). https: //
  • NVIDIA Modulus - Physics Informed Neural Networks https: //
  • Benefits package
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
  • Remuneration

    about 590€ gross per month (internship allowance)

    General Information
  • Theme/Domain : Optimization, machine learning and statistical methods Scientific computing (BAP E)

  • Town/city : Lille

  • Inria Center : Centre Inria de l'Université de Lille
  • Starting date : 2023-02-01
  • Duration of contract : 6 months
  • Deadline to apply : 2022-12-22
  • Contacts
  • Inria Team : SCOOL
  • Recruiter : Shilova Alena /
  • About Inria

    Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.

    Instruction to apply

    CV + cover letter

    Defence Security : This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.

    Recruitment Policy : As part of its diversity policy, all Inria positions are accessible to people with disabilities.

    Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.

    From this employer

    Recent blogs

    Recent news