BDBComp
Parceria:
SBC
Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes

Willy Arthur Silva ReisKarina Valdivia DelgadoLeliane Nunes de Barros

Markov Decision Processes can be used to solve sequential decision-making problems. However, sometimes it is difficult to estimate accurate transition probabilities. To deal with this problem, Bounded Markov Decision Processes ( BMDP s) were introduced. In BMDP s the probabilities are given by intervalsand the objective is to obtain a policy taking into account this imprecision. Whenrobust policies are required, one of the criteria used is the maximin criterionthat chooses the best under the worst model. Classical solutions for BMDP s areSynchronous Robust Value Iteration and Synchronous Robust Policy Iteration.In this paper, we propose a new asynchronous algorithm called AsynchronousRobust Policy Iteration ( ARPI ). We also propose a distributed version namedDistributed and Asynchronous Robust Policy Iteration ( DARPI ) that can be upto 26 times faster than the classical solution for large instances.

http://www.lbd.dcc.ufmg.br/colecoes/eniac/2016/009.pdf

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato: bdbcomp@lbd.dcc.ufmg.br
     Mantida por:
LBD