Towards a universal scheme for anaphora, deixis and nominal semantics
Massimo Poesio & the rest of the Universal Anaphora team1
About ten years ago the Universal Dependency initiative was launched, with the objective of achieving cross-linguistic consistency in the annotation of dependency structure in language. The project has been very successful, in part because it managed to involve a very large percentage of the treebanking community. By contrast, the anaphoric annotation projects of the last twenty years, from Ancora to ARRAU to GUM to the NAIST Text Corpus to Ontonotes to the Polish Coreference Corpus to the Prague Dependency Treebank to Tueba/DZ, have proceeded primarily independently from each other. There has however been a lot of interaction between the projects, meaning that there is a substantial overlap between the annotation guidelines adopted. In the last couple of years, a discussion has started within the CRAC and, now, CODI to start a ‘Universal Anaphora’ initiative to develop in the first instance a common markup scheme building on those used in the 2010 SEMEVAL Shared Task, the 2011 and 2012 CONLL Shared Task, and the 2018 CRAC Shared Task. In this talk I will present some ideas in that direction originated from our work on the the GNOME, LiveMemories and ARRAU corpora.
At present, the confirmed members of the team include Maciej Ogrodniczuk, Sameer Pradhan, Carolyn Rose, Michael Strube, Amir Zeldes, Yulia Grishina, Yufang Hou, Sopan Khosla, Fred Landragin, Ramesh Manuvinakurike, Vincent Ng, Juntao Yu, but we hope more of the community will join us.↩