PoTeC: The Potsdam Textbook Corpus

Description

The Potsdam Textbook Corpus (PoTeC) is a naturalistic eye-tracking-while-reading corpus containing data from 75 participants reading 12 scientific texts. PoTeC is the first naturalistic eye-tracking-while-reading corpus that contains eye-movements from domain-experts as well as novices in a within-participant manipulation: It is based on a 2×2×2 fully-crossed factorial design which includes the participants’ level of studies and the participants’ discipline of studies as between-subject factors and the text domain as a within-subject factor. The participants’ reading comprehension was assessed by a series of text comprehension questions and their domain knowledge was tested by text-independent background questions for each of the texts. The materials are annotated for a variety of linguistic features at different levels. We envision PoTeC to be used for a wide range of studies including but not limited to analyses of expert and non-expert reading strategies. The corpus and all the accompanying data at all stages of the preprocessing pipeline and all code used to preprocess the data are made available viaGitHub andOSF. The data is furthermore integrated into the open-source package pymovements which can be used in Python and R: pymovements.

Accessing the data

The stimuli that have been used for PoTeC are copyrighted. In oder for us to publish them together with the data, we acquired licenses for all text extracts. The stimulus texts and the licences are made available below.

PoTeC Stimuli (TSV, 31 KB)

PoTeC Stimuli Licenses (ZIP, 1 MB)

The licenses for the texts are only valid for us. If you intend to publish the data via another channel than this present website, you will have to acquire your own licenses.

All the other data, such as raw eye-tracking data, reading measures, or fixation sequences can be downloaded here: GitHub andOSF.

Citation

PoTeC: A German Naturalistic Eye-tracking-while-reading Corpus, Behaviour Research Methods,
Deborah N. Jakobi, Thomas Kern, David R. Reich, Patrick Haller, and Lena A. Jäger (in press)
[arxiv preprint |code | bib]

Department of Computational Linguistics Digital Linguistics

Quicklinks und Sprachwechsel

Main navigation

PoTeC: The Potsdam Textbook Corpus

Description

Accessing the data

Citation