The CASE Project | Opens internal link in current windowResearchers and Partners | Opens internal link in current windowData Description and Samples | Opens internal link in current windowTranscription Conventions | Publications | ViMELF

The CASE project

Compiling a corpus of informal English as a Lingua Franca (ELF) conversations

The CASE project was started in 2012 at Saarland University with the aim of collecting video-mediated conversations in an international English-language context and thus create a dataset or "corpus" that allows research of this particular communication type. Until 2018,  teams of researchers from Germany, Bulgaria, Spain, Italy, Sweden, Finland, France, Belgium, the UK and the US have compiled more than 250 hours of conversations using Skype as a medium. The conversations are first encounters between two participants from different countries and last between 30 and 60 minutes.

Of particular interest to us are pragmatics and discourse in a video-mediated communication setting, cultural and intercultural negotiation, issues of identity, the role of plurilingual resources, and the influence of the communication medium on issues such as rapport and cooperation in an international setting. 

Conversations are transcribed according to pragmatic transcription guidelines, with the aim of allowing for a wide range of applications and in particular focusing on spoken language features, multimodality, and the use of plurilingual resources. Our team of researchers has published several papers on various aspects of the project, some of which are available online. A detailed description of the issues related to the analysis of spoken data with extensive examples can be found in

  • Brunner, Marie-Louise; Stefan Diemer; and Selina Schmidt. 2017. “... okay so good luck with that ((laughing))?” - Managing rich data in a corpus of Skype conversations. Studies in Variation, Contacts and Change in English 19 [Big and Rich Data in English Corpus Linguistics: Methods and explorations, ed. by Turo Hiltunen; Joe McVeigh; and Tanja Säily]. Helsinki: Varieng. Full text Opens external link in new windowhere. [Opens external link in new windowhttp://www.helsinki.fi/varieng/series/volumes/19/brunner_diemer_schmidt/].

The recordings have been completed in 2018 with a total of more than 250 hours of data. The raw data has been used for various qualitative studies. 

  • CASE. 2018. Corpus of Academic Spoken English – Recordings. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case].

While the CASE project is still ongoing, several preliminary datasets have been analyzed and discussed in our publications. A preliminary set of 20 conversations, BabyCASE was compiled in 2017:

  • BabyCASE. 2017. 20 conversations from the CASE project. Birkenfeld: Trier University of Applied Sciences & Saarbrücken: Saarland University. [http://umwelt-campus.de/case].
  • FoodCASE 2015. Conversations about food from the CASE project. Birkenfeld: Trier University of Applied Sciences & Saarbrücken: Saarland University. [http://umwelt-campus.de/case].
  • FoodCASE v2 2017. Conversations about food from the CASE project. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case].

In April 2018, the  first finalized corpus based on data from the CASE project will be released for scientific use: ViMELF.

ViMELF - A Corpus of Video-Mediated English as a Lingua Franca Conversations

ViMELF contains 20 Skype conversations between 40 speakers from Germany (20 speakers), Spain (5), Italy (5), Finland (5), and Bulgaria (5), totaling 744.5 minutes (ca. 12.5 hours), with an average conversation length of 37.23 minutes. The corpus comprises 113 677 words in the plain text version and 152 467 items in the annotated (preliminary numbers).

The transcripts are available as .docx and .txt files; the videos in MPEG4 format. Several versions are available: the fully annotated pragmatic version as text and XML, a lexical version, and a POS-tagged version (auto-tagged with CLAWS).

Citations

Citing ViMELF - A Corpus of Video-Mediated English as a Lingua Franca Conversations:

ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access). 

Citing the CASE project: 

Long citation:

The CASE project. 2012-2018. Stefan Diemer; Marie-Louise Brunner; Caroline Collet; and Selina Schmidt. Birkenfeld: Trier University of Applied Sciences (coordination) / Saarbrücken: Saarland University / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Lyon: Université Lumière Lyon 2 / Louvain-la-Neuve: Université catholique de Louvain / Boise: Boise State University. [http://umwelt-campus.de/case] (date of last access).

Short citation:

The CASE project. 2012-2018. Birkenfeld: Trier University of Applied Sciences. [http://umwelt-campus.de/case] (date of last access).