For humans and animals to behave adaptively in their environment, they must learn the consequences of different actions with very limited feedback. Often, this feedback, in the form of reward or punishment, is unreliable because it does not always appear. Even if reward (or punishment) does arrive, it is usually so delayed that it becomes very difficult to know which of the many actions that have been performed was rewarded. And yet, both animals and humans are astonishingly good at this job. The theoretical and experimental field of reinforcement learning studies deals with learning to predict future rewards and punishments, and with the best behavior given this prediction. This job is further complicated by the need to find the correct goldilocks balance between specificity of the learnt task and generalization over different contexts. To quote Heraclitus, you cannot step into the same river twice. Obviously, even the most efficient reinforcement learning would be completely useless if it would involve finding the correct action to perform in the very unique circumstances that the problem was first encountered. On the other hand, the behavior learned by a pilgrim in the Jordan River might prove fatal in the crocodile infested Daly. Therefore, adaptive reinforcement learning first requires a correct identification of the relevant circumstances, or state representation. Finally, world statistics (and among them the laws of reward) are not guaranteed to remain fixed, and it is therefore beneficial to embark on occasional exploratory expeditions to adjust their internal model of the world. Thus, on each occasion, a decision is taken following the chosen behavioral policy. This policy should optimally include some well-balanced mixture of exploration, for new learning and exploitation, in which previously learned best actions are performed to collect rewards.
In our lab, we are studying the underlying neural mechanisms of these learning problems. To do this, we simultaneously record the activity of neurons in the rodent brain, and many aspects of the animal’s immediate behavior, as well as changes in environmental conditions, while the animals learn to solve difficult learning tasks. Synchronization of the neural signals, the behavior and the world allows us to learn how environmental variables are incorporated to affect behavior, and how these processes are implemented in the appropriate neural circuits. In different projects we study neural representation in the hippocampus, striatum, cortex and midbrain of the animals, and how these representations result in learning.