Navigating text adventures with algorithmic reasoners
Dr Petar Veličković
Text-based adventure games, such as Zork, use natural language to describe their setup, receive actions from the player, and describe the effects of such actions. However, they typically also have a clear underlying geometric structure, which links different states ("rooms") with each other as a result of taking actions (e.g. "moving"). This also endows them with a set of symmetries (e.g. it is often the case that going west, then going east, has the outcome of not changing the state). Hence, text adventures represent a fantastic test-bed for benchmarking agents on real-world noisy language data, while still grounding that data in a simplified underlying "world model".
Neural algorithmic reasoning (see Cappart et al., IJCAI'21; Section 3.3.) represents an emerging area of (graph) representation learning that seeks to emulate operations of classical algorithms and data structures within a high-dimensional latent space---allowing us to more directly execute classical algorithms over noisy data. A classical collection of algorithms shown to be efficiently learnable in this way are path-finding routines (such as the Bellman-Ford algorithm). Quickly solving text adventures typically involves (implicitly) constructing a map of the environment, determining the current and goal states in this map, then searching for a (shortest) path from the current state to the goal.
All of these components are tricky to do in a generic text adventure setting. To simplify, we will assume that the map is given to us. It is still non-trivial to map current and previous state descriptions to a particular current and goal state, and through the use of neural algorithmic reasoning, we will attempt to directly feed such natural language descriptions to an algorithm, and then use the output representations of that algorithm to find shortest paths to the required goal state.
The work will be based on the TextWorld environment, specifically the subset of coin-collecting environments proposed by Yuan, Côté et al. (EWRL'18), for which code and example agents are already available. While our focus will not necessarily be reinforcement learning, I anticipate that this will provide an excellent starting point. Neural path-finders will be trained by using the methodology proposed by Veličković et al. (ICLR'20) and deployed in the style of XLVIN (Deac et al., NeurIPS DeepRL'20).
Project timezone: C