topics germane to autonomous systems. The vehicle routing problem (VRP) is a
particularly well-studied decision problem in which the goal is to find an
optimal allocation of agents to tasks while respecting spatial, temporal, and
capacity constraints. Air traffic control, postal service package delivery, and
many other similar applications can generally be categorized as multi-agent
vehicle routing. Traditional solution approaches have a difficulty in coping
with such broad problems, characterized by vast state spaces, evolving
constraints, unstructured environments, and ambiguous state transition
dynamics.
Departing from traditional optimization and search-based
approaches, this work seeks to pose vehicle-routing-based problems in a machine
learning context. A representation scheme is presented that scales
independently of the physical problem size. The abstracted cost-based state
captures an agent’s ability to perform a certain task, transferring complexity
from constraints into the state itself. In order to capture temporal
constraints such as deadlines without increasing complexity, a semi-Markov
model is proposed and analyzed relative to a traditional Markov model.
Well-known iterative learning algorithms based on dynamic programming readily
solve for optimal policies and can be compared to known VRP solutions. Results
borne out in simulation demonstrate the benefits of modeling high-level decision
problems such as VRPs in a learning framework.