Safe Inclusion of Information about Rates of Variation in a Reinforcement Learning Algorithm

Carlos H. C. Ribeiro

There is a need to enhance reinforcement learning techniques by using prior knowledge built into the agent at its inception. The information crudeness upon which those algorithms operate may be interesting from a theoretical point of view, but large scale problems are made too difficult and unrealistic by considering the learning agent as a tabula rasa'. Nonetheless,knowledge must be embedded in such a way that the structural, well-studied characteristics of the fundamental algorithms are maintained.A more general formulation of a classical reinforcement learning method is investigated in this article. It allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state, and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about expected rates of variation of action values, and practical studies demonstrate an application of the proposed algorithm.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: