The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

Robbie Love; Claire Dembry; Andrew Hardie; Vaclav Brezina; Tony McEnery

doi:10.1075/ijcl.22.3.02lov

The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina, Tony McEnery

Research output: Contribution to journal › Article › peer-review

Abstract

This paper introduces the Spoken British National Corpus 2014, an
11.5-million-word corpus of orthographically transcribed conversations
among L1 speakers of British English from across the UK, recorded in the years
2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe
the main stages of the Spoken BNC2014’s creation: design, data and metadata
collection, transcription, XML encoding, and annotation. In doing so we aim
to (i) encourage users of the corpus to approach the data with sensitivity to the
many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora
of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both
logistically and practically, than in the past.

Original language	English
Pages (from-to)	319-344
Journal	International Journal of Corpus Linguistics
Volume	22
Issue number	3
DOIs	https://doi.org/10.1075/ijcl.22.3.02lov
Publication status	Published - 31 Dec 2017

Bibliographical note

© John Benjamins Publishing Company
This is an open access article under a OA CC BY license

Access to Document

10.1075/ijcl.22.3.02lovLicence: CC BY 3.0

The Spoken BNC2014
© John Benjamins Publishing Company This is an open access article under a OA CC BY license
Final published version, 201 KBLicence: CC BY 3.0

Cite this

@article{6a7cbcac24b143b29c785d269fea2c5b,

title = "The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations",

abstract = "This paper introduces the Spoken British National Corpus 2014, an11.5-million-word corpus of orthographically transcribed conversationsamong L1 speakers of British English from across the UK, recorded in the years2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describethe main stages of the Spoken BNC2014{\textquoteright}s creation: design, data and metadatacollection, transcription, XML encoding, and annotation. In doing so we aimto (i) encourage users of the corpus to approach the data with sensitivity to themany methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corporaof the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, bothlogistically and practically, than in the past.",

author = "Robbie Love and Claire Dembry and Andrew Hardie and Vaclav Brezina and Tony McEnery",

note = "{\textcopyright} John Benjamins Publishing Company This is an open access article under a OA CC BY license",

year = "2017",

month = dec,

day = "31",

doi = "10.1075/ijcl.22.3.02lov",

language = "English",

volume = "22",

pages = "319--344",

journal = "International Journal of Corpus Linguistics",

issn = "1384-6655",

publisher = "John Benjamins",

number = "3",

}

TY - JOUR

T1 - The Spoken BNC2014

T2 - Designing and building a spoken corpus of everyday conversations

AU - Love, Robbie

AU - Dembry, Claire

AU - Hardie, Andrew

AU - Brezina, Vaclav

AU - McEnery, Tony

PY - 2017/12/31

Y1 - 2017/12/31

N2 - This paper introduces the Spoken British National Corpus 2014, an11.5-million-word corpus of orthographically transcribed conversationsamong L1 speakers of British English from across the UK, recorded in the years2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describethe main stages of the Spoken BNC2014’s creation: design, data and metadatacollection, transcription, XML encoding, and annotation. In doing so we aimto (i) encourage users of the corpus to approach the data with sensitivity to themany methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corporaof the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, bothlogistically and practically, than in the past.

AB - This paper introduces the Spoken British National Corpus 2014, an11.5-million-word corpus of orthographically transcribed conversationsamong L1 speakers of British English from across the UK, recorded in the years2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describethe main stages of the Spoken BNC2014’s creation: design, data and metadatacollection, transcription, XML encoding, and annotation. In doing so we aimto (i) encourage users of the corpus to approach the data with sensitivity to themany methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corporaof the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, bothlogistically and practically, than in the past.

UR - https://benjamins.com/catalog/ijcl.22.3.02lov/fulltext

U2 - 10.1075/ijcl.22.3.02lov

DO - 10.1075/ijcl.22.3.02lov

M3 - Article

SN - 1384-6655

VL - 22

SP - 319

EP - 344

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

IS - 3

ER -

The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

The British National Corpora

1+3 MA + PhD Studentship

Cite this

The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Research output

The British National Corpora

Prizes

1+3 MA + PhD Studentship

Cite this