Coping with Variation in the Icelandic Diachronic Treebank

  • Eiríkur Rögnvaldsson University of Iceland
  • Anton Karl Ingason University of Iceland
  • Einar Freyr Sigurðsson University of Iceland

Abstract

We present an overview of an ongoing project which has the aim of developing methods for building a treebank of Icelandic. The treebank will contain both written and spoken language, and in addition have a diachronic dimension. Since Icelandic is an example of what has been called a less-resourced language when it comes to computational linguistics and language technology, it is essential to utilize the limited resources available as economically and efficiently as possible. We emphasize the importance of open source software and the interplay between linguistic knowledge and technological skills. We describe the workflow in the construction of the treebank and show how the different software tools work together towards the final representation. Finally, we show how the treebank can be used in studying some well known phenomena in Icelandic syntax.

Author Biographies

Eiríkur Rögnvaldsson, University of Iceland
Professor, Faculty of Icelandic and Comparative Cultural Studies
Anton Karl Ingason, University of Iceland
Master's student, Faculty of Icelandic and Comparative Cultural Studies
Einar Freyr Sigurðsson, University of Iceland
Master's student, Faculty of Icelandic and Comparative Cultural Studies
Published
2011-06-17