Current Challenges in Computational Literary Studies

Abstract

Within the broader and very diverse field of Digital Humanities, Computational Literary Studies (CLS) is one of the most long-standing areas of practice, going back at least to the 1960s, albeit under a number of different names, among them Stylometry, Computational Stylistics, Corpus Stylistics, Cultural Analytics or Distant Reading, each with their own perspectives, particularities and methods. However, CLS appears to be a denomination many scholars at least in Europe are gathering around at the moment.

This talk is about the current state of affairs in CLS. My aim is to describe, in the abstract, and to illustrate, through some concrete examples from projects I know well, some of the most pressing challenges I currently see in the field, from a European perspective. The idea is that these are areas that we should all work on together in the coming years, if we want to move the field forward. Overall, and even though I can only speak about CLS in Europe, I hope that some of the challenges I describe will also provide some fruitful impulses for Digital Humanities in China.

  • The first challenge is operationalization, in the area of research design. It is essential for research in CLS to develop, in parallel and in close interaction, the understanding we have of a phenomenon or concept from established literary studies on the one hand, and the implementation we create in order to model, measure, or otherwise compute indicators for this phenomenon or concept in the computational domain, on the other hand. This is especially challenging when there is little solid ground to start from, either conceptually or computationally. I will illustrate this challenge based on experience I have acquired so far in our project Zeta and Company regarding concepts and measures of ‘keyness’ when comparing two groups of texts.
  • The second challenge concerns diversity. One of the great promises and rewards of CLS (and of Digital Humanities more generally) is the possibility to look beyond the narrow canon of highly-prestigious authors or influential events, and embrace the full diversity of cultural production, both present and historical. To foster diversity when designing, building, annotating and analyzing corpora, we need to solve challenges of multiple languages, distinct writing systems and diverse cultural contexts. I will illustrate this with my experience in the project Distant Reading for European Literary History, where we are collaboratively building roughly comparative collections of 100 novels each in more than 16 different European languages.
  • The third challenge is metadata. The value of detailed, relevant metadata about our texts becomes increasingly clear in CLS as we come to build and analyse ever larger corpora that include a lot of little-known texts, and as we refine our methods of analysis. Generating this metadata is challenging, because it needs to be extracted automatically from sources as diverse as bibliographies and catalogs, literary histories and scholarly articles, and the literary texts themselves. The metadata also needs to be modeled in a useful way. I will illustrate this challenge with some lessons from our ongoing project Mining and Modeling Text.
  • The fourth challenge concerns Open Access. Unrestricted access to both scholarly publications and primary literature (novels, plays, poetry) is an essential condition for state-of-the-art research in CLS, as it permits transparent, reproducible and sustainable research. The challenge here lies in the need to balance these scholarly requirements with legitimate concerns of publishers in the context of copyright legislation, and in the fact that as literary scholars, we can no longer ignore the legal domain.