The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina, Tony McEnery

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces the Spoken British National Corpus 2014, an
11.5-million-word corpus of orthographically transcribed conversations
among L1 speakers of British English from across the UK, recorded in the years
2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe
the main stages of the Spoken BNC2014’s creation: design, data and metadata
collection, transcription, XML encoding, and annotation. In doing so we aim
to (i) encourage users of the corpus to approach the data with sensitivity to the
many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora
of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both
logistically and practically, than in the past.
Original languageEnglish
Pages (from-to)319-344
JournalInternational Journal of Corpus Linguistics
Volume22
Issue number3
DOIs
Publication statusPublished - 31 Dec 2017

Bibliographical note

© John Benjamins Publishing Company
This is an open access article under a OA CC BY license

Fingerprint Dive into the research topics of 'The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations'. Together they form a unique fingerprint.

Cite this