Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes

Willy Arthur Silva ReisKarina Valdivia DelgadoLeliane Nunes de Barros

Markov Decision Processes can be used to solve sequential decision-making problems. However, sometimes it is difficult to estimate accurate transition probabilities. To deal with this problem, Bounded Markov Decision Processes ( BMDP s) were introduced. In BMDP s the probabilities are given by intervalsand the objective is to obtain a policy taking into account this imprecision. Whenrobust policies are required, one of the criteria used is the maximin criterionthat chooses the best under the worst model. Classical solutions for BMDP s areSynchronous Robust Value Iteration and Synchronous Robust Policy Iteration.In this paper, we propose a new asynchronous algorithm called AsynchronousRobust Policy Iteration ( ARPI ). We also propose a distributed version namedDistributed and Asynchronous Robust Policy Iteration ( DARPI ) that can be upto 26 times faster than the classical solution for large instances.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: