Representations of language in visually grounded neural models
Grzegorz Chrupala (Tilburg University),
The task of learning language in a visually-grounded setting, with weak and noisy supervision, is of interest to scientists trying to understand the human mind as well as to engineers trying to build smart conversational agents or robots. In this talk I present models of grounded language learning based on recurrent neural networks which learn language from sentences paired with images of corresponding visual scenes. Input sentences are given at different levels of granularity: as sequences of words, sequences of phonemes, or as an acoustic signal.
I evaluate the internal representations induced by the models in these scenarios and present quantitative and qualitative analyses of their characteristics. I show how they encode language form and function, and specifically how they are selectively sensitive to certain aspects of language structure such as word boundaries, lexical categories and grammatical functions.