Manuela Veloso
CMU
"Bounding the Suboptimality of Reusing Subproblems"
1/25/1999: [time not recorded]
[location not recorded]
Abstract: I have always been interested in the problem of reusing solution plans
to simpler problems to solve more complex ones. It is a well
recognized difficult and open question to understand the trade-offs in
solution quality of this reuse process. We have been studying this
problem within the context of action selection in a non-deterministic
environment. Markov decision processes (MDPs) provide a framework for
representing this action selection problem, and there are a number of
algorithms that learn optimal policies within this formulation. This
framework has also been used to study state space abstraction, problem
decomposition, and policy reuse. These techniques sacrifice
optimality of their solution for improved learning speed. In this
talk, we examine the suboptimality of reusing policies that are
solutions to subproblems. This is done within a restricted class of
MDPs, namely those where non-zero reward is received only upon
reaching a goal state. We introduce the definition of a subproblem
within this class and provide motivation for how reuse of subproblem
solutions can speed up learning. The contribution of this work is the
derivation of a tight bound on the loss in optimality from this reuse.
We examine a bound that is based on Bellman error, which applies to
all MDPs, but does not provide us with a tight bound. We contribute
our own theoretical result that gives an empirically tight bound on
this suboptimality.
This is joint work with Michael Bowling.
Last updated: Mon Jun 19 17:44:06 2006
 |