Rollout algorithms provide a method for approximately solving a large class of discrete and dynamic optimization problems. Using a lookahead approach, rollout algorithms leverage repeated use of a greedy algorithm, or base policy, to intelligently make decisions. This technique is easy to implement, inherits performance bounds given by the selected base policy, and performs very well in practice. In some cases the observed performance is near optimal. However, there have been few theoretical results guaranteeing a strict improvement over base policies.
Our work shows that for simple combinatorial optimization problems such as the subset sum problem and knapsack problem, rollout algorithms are proven to yield a strict improvement in comparison to base policies. Our bounds hold for average-case instances of these problems, complementing existing work on worst-case performance. We distinguish between two methods in which the rollout algorithm can be applied, which we refer to as the consecutive rollout and the exhaustive rollout. We are now analyzing more complicated problems, such as multi-dimensional knapsack problems and decision trees.