Finding environmental discourse in historical newspapers: a topic model workflow for query disambiguation

Authors

  • Peeter Tinits

DOI:

https://doi.org/10.5617/dhnbpub.10663

Abstract

Digitized historical newspapers are a treasure trove of information for our understanding of the past. As one popular application, the frequencies of query matches can be used to understand the prevalence of some discourse in a historical era. This requires the construction good queries: broad enough to capture diverse contexts and narrow enough to exclude irrelevant ones. For historical research in digital humanities, targeted queries that emphasize precision have been advised. In this paper, we develop an alternative approach, by using broad queries to cast a wider net and then using topic models built on the match contexts to filter out irrelevant matches. Specifically, we look for contexts discussing environmental issues throughout the 20th century using a corpus of two Australian newspapers. We report on a comparison of iteratively constructed narrow and broad queries and their precision and recall, and find our approach to discover roughly 7-10x more matches with a comparable level of accuracy. This combined approach can work well for focussed research projects where deliberate query construction and qualitative feedback on the results is feasible.

Downloads

Published

2023-10-10

How to Cite

Tinits, Peeter. 2023. “Finding Environmental Discourse in Historical Newspapers: A Topic Model Workflow for Query Disambiguation”. Digital Humanities in the Nordic and Baltic Countries Publications 5 (1). Oslo, Norway:344-54. https://doi.org/10.5617/dhnbpub.10663.