Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

Joyce Lim; Geraldine Mark; Pascual Pérez-Paredes; Anne O'Keeffe

Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

Joyce Lim, Geraldine Mark, Pascual Pérez-Paredes, Anne O'Keeffe

English, Languages and Applied Linguistics

Research output: Contribution to journal › Article › peer-review

Abstract

This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a priori set of words and word sequences. Using corpus linguistics informed by usage-based
theories of language learning, this paper examines the frequency and distribution of 4-slot POStag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

Original language	English
Journal	Corpora
Volume	19
Issue number	1
Publication status	Accepted/In press - 7 Feb 2023

Bibliographical note

This is an Accepted Manuscript of an article published by Edinburgh University Press in Corpora. The Version of Record is available online at: http://www.euppublishing.com/doi/abs/[link here once published].

Embargoed Document

Corpora_Exploring-Part-of-Speech-POS-tag-sequences-in-a-large-scale-learner-corpus-of-L2-English
Accepted author manuscript, 728 KB
Embargo ends: 1/01/50
Request copy

Cite this

@article{8df79cc309ca468b814484559a493bd2,

title = "Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective",

abstract = "This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a priori set of words and word sequences. Using corpus linguistics informed by usage-basedtheories of language learning, this paper examines the frequency and distribution of 4-slot POStag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts. ",

author = "Joyce Lim and Geraldine Mark and Pascual P{\'e}rez-Paredes and Anne O'Keeffe",

note = "This is an Accepted Manuscript of an article published by Edinburgh University Press in Corpora. The Version of Record is available online at: http://www.euppublishing.com/doi/abs/[link here once published].",

year = "2023",

month = feb,

day = "7",

language = "English",

volume = "19",

journal = "Corpora",

issn = "1749-5032",

publisher = "Edinburgh University Press",

number = "1",

}

TY - JOUR

T1 - Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

AU - Lim, Joyce

AU - Mark, Geraldine

AU - Pérez-Paredes, Pascual

AU - O'Keeffe, Anne

N1 - This is an Accepted Manuscript of an article published by Edinburgh University Press in Corpora. The Version of Record is available online at: http://www.euppublishing.com/doi/abs/[link here once published].

PY - 2023/2/7

Y1 - 2023/2/7

N2 - This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a priori set of words and word sequences. Using corpus linguistics informed by usage-basedtheories of language learning, this paper examines the frequency and distribution of 4-slot POStag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

AB - This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a priori set of words and word sequences. Using corpus linguistics informed by usage-basedtheories of language learning, this paper examines the frequency and distribution of 4-slot POStag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

M3 - Article

SN - 1749-5032

VL - 19

JO - Corpora

JF - Corpora

IS - 1

ER -

Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

Abstract

Bibliographical note

Embargoed Document

Fingerprint

Cite this