

The corpus was initiated to meet a growing interest in naturalistic spoken language data. It is incrementally growing and presently consists of 1.3 million words from about 25 different social. Work on the corpus was started in the late 1970:s. The paper contains a description of the Spoken Language Corpus of Swedish at the Department of Linguistics, Göteborg University (GSLC), and a summary of the various types of analysis and tools that have been developed for work on this corpus. Although the discussion is based on transfer and transliteration between two specific corpora (the Danish BySoc, BySociolingvistisk Korpus, and the Swedish GSLC, Göteborg Spoken Language Corpus), we believe that the discussion in the article documents and highlights problems of a general kind which have to be faced whenever spoken language corpora of different formats are to be compared. problems that arise are related both to the differences that exist between the standards of the corpora and to human errors leading to lack of reliability in creating the transcriptions.

This paper discusses problems that arise in trying to transfer a spoken language corpus transcribed and formatted according to one standard into the standard and format of another corpus.

We understand phenomena better through comparison and contrast.

1993).Ĭomparison of languages and linguistic data is essential if progress in our understanding of the nature of spoken languages is to be made. MapTask (Isard and Carletta 1995), TRAINS (Heeman and Allen 1994), Waxholm (Blomberg et al. This type of spoken language corpus is still fairly unique even for English, since many spoken language corpora (certainly for Swedish) have been collected for special purposes, like speech recognition, phonetics, dialectal variation or interaction with a computerized dialog system in a very narrow domain, e.g. The goal of the corpus is to include spoken language from as many social activities as possible to get a more complete understanding of the role of language and communication in human social life. It is based on the fact that spoken language varies considerably in different social activities with regard to pronunciation, vocabulary, grammar and communicative functions. It is incrementally growing and presently consists of 1.3 million words from about 25 different social activities.
