Coping with Variation in the Icelandic Diachronic Treebank
AbstractWe present an overview of an ongoing project which has the aim of developing methods for building a treebank of Icelandic. The treebank will contain both written and spoken language, and in addition have a diachronic dimension. Since Icelandic is an example of what has been called a less-resourced language when it comes to computational linguistics and language technology, it is essential to utilize the limited resources available as economically and efficiently as possible. We emphasize the importance of open source software and the interplay between linguistic knowledge and technological skills. We describe the workflow in the construction of the treebank and show how the different software tools work together towards the final representation. Finally, we show how the treebank can be used in studying some well known phenomena in Icelandic syntax.