Comparing sentence-level features for authorship analysis in Portuguese

Rui Sousa-Silva, Luís Sarmento, Tim Grant, Eugénio Oliveira, Belinda Maia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.

LanguageEnglish
Title of host publicationComputational processing of the Portuguese language
Subtitle of host publication9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings
EditorsThiago Alexandre Salgueiro Pardo, António Branco, Aldebaro Klautau, et al
Place of PublicationBerlin (DE)
PublisherSpringer
Pages51-54
Number of pages4
ISBN (Electronic)978-3-642-12320-7
ISBN (Print)978-3-642-12319-1
DOIs
Publication statusPublished - 23 Dec 2010
Event9th International Conference on Computational Processing of the Portuguese Language - Porto Alegre, RS, Brazil
Duration: 27 Apr 201030 Apr 2010

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume6001
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Computational Processing of the Portuguese Language
Abbreviated titlePROPOR 2010
CountryBrazil
CityPorto Alegre, RS
Period27/04/1030/04/10

Fingerprint

Classifiers
Classifier
Robustness
Corpus

Cite this

Sousa-Silva, R., Sarmento, L., Grant, T., Oliveira, E., & Maia, B. (2010). Comparing sentence-level features for authorship analysis in Portuguese. In T. A. Salgueiro Pardo, A. Branco, A. Klautau, & et al (Eds.), Computational processing of the Portuguese language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings (pp. 51-54). (Lecture Notes in Computer Science; Vol. 6001). Berlin (DE): Springer. https://doi.org/10.1007/978-3-642-12320-7_7
Sousa-Silva, Rui ; Sarmento, Luís ; Grant, Tim ; Oliveira, Eugénio ; Maia, Belinda. / Comparing sentence-level features for authorship analysis in Portuguese. Computational processing of the Portuguese language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. editor / Thiago Alexandre Salgueiro Pardo ; António Branco ; Aldebaro Klautau ; et al. Berlin (DE) : Springer, 2010. pp. 51-54 (Lecture Notes in Computer Science).
@inproceedings{de55702fc7b44e368284cc1f15fc9bd1,
title = "Comparing sentence-level features for authorship analysis in Portuguese",
abstract = "In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.",
author = "Rui Sousa-Silva and Lu{\'i}s Sarmento and Tim Grant and Eug{\'e}nio Oliveira and Belinda Maia",
year = "2010",
month = "12",
day = "23",
doi = "10.1007/978-3-642-12320-7_7",
language = "English",
isbn = "978-3-642-12319-1",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "51--54",
editor = "{Salgueiro Pardo}, {Thiago Alexandre} and Ant{\'o}nio Branco and Aldebaro Klautau and {et al}",
booktitle = "Computational processing of the Portuguese language",
address = "Germany",

}

Sousa-Silva, R, Sarmento, L, Grant, T, Oliveira, E & Maia, B 2010, Comparing sentence-level features for authorship analysis in Portuguese. in TA Salgueiro Pardo, A Branco, A Klautau & et al (eds), Computational processing of the Portuguese language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. Lecture Notes in Computer Science, vol. 6001, Springer, Berlin (DE), pp. 51-54, 9th International Conference on Computational Processing of the Portuguese Language, Porto Alegre, RS, Brazil, 27/04/10. https://doi.org/10.1007/978-3-642-12320-7_7

Comparing sentence-level features for authorship analysis in Portuguese. / Sousa-Silva, Rui; Sarmento, Luís; Grant, Tim; Oliveira, Eugénio; Maia, Belinda.

Computational processing of the Portuguese language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. ed. / Thiago Alexandre Salgueiro Pardo; António Branco; Aldebaro Klautau; et al. Berlin (DE) : Springer, 2010. p. 51-54 (Lecture Notes in Computer Science; Vol. 6001).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Comparing sentence-level features for authorship analysis in Portuguese

AU - Sousa-Silva, Rui

AU - Sarmento, Luís

AU - Grant, Tim

AU - Oliveira, Eugénio

AU - Maia, Belinda

PY - 2010/12/23

Y1 - 2010/12/23

N2 - In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.

AB - In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.

UR - http://www.scopus.com/inward/record.url?scp=78650257512&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12320-7_7

DO - 10.1007/978-3-642-12320-7_7

M3 - Conference contribution

SN - 978-3-642-12319-1

T3 - Lecture Notes in Computer Science

SP - 51

EP - 54

BT - Computational processing of the Portuguese language

A2 - Salgueiro Pardo, Thiago Alexandre

A2 - Branco, António

A2 - Klautau, Aldebaro

A2 - et al,

PB - Springer

CY - Berlin (DE)

ER -

Sousa-Silva R, Sarmento L, Grant T, Oliveira E, Maia B. Comparing sentence-level features for authorship analysis in Portuguese. In Salgueiro Pardo TA, Branco A, Klautau A, et al, editors, Computational processing of the Portuguese language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. Berlin (DE): Springer. 2010. p. 51-54. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-642-12320-7_7