The SweLL Language Learner Corpus From Design to Annotation

Elena Volodina
Lena Granstedt
Arild Matsson
Beáta Megyesi
Ildikó Pilán
Julia Prentice
Dan Rosén
Lisa Rudebeck
Carl-Johan Schenström
Gunlög Sundberg
Mats Wirén


The article presents a new language learner corpus for Swedish, SweLL, and the methodology from collection and pesudonymisation to protect personal information of learners to annotation adapted to second language learning. The main aim is to deliver a well-annotated corpus of essays written by second language learners of Swedish and make it available for research through a browsable environment. To that end, a new annotation tool and a new project management tool have been implemented, – both with the main purpose to ensure reliability and quality of the final corpus. In the article we discuss reasoning behind metadata selection, principles of gold corpus compilation and argue for separation of normalization from correction annotation.

