A comparative evaluation of term recognition algorithms

Ziqi Zhang; José Iria; Christopher Brewster; Fabio Ciravegna

A comparative evaluation of term recognition algorithms

Ziqi Zhang, José Iria, Christopher Brewster, Fabio Ciravegna

Research output: Chapter in Book/Published conference output › Chapter

Abstract

Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.

Original language	English
Title of host publication	Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08)
Pages	2108-2111
Number of pages	6
Publication status	Published - May 2008
Event	6th International Conference on Language Resources and Evaluation - Marrakech, Morocco Duration: 1 May 2008 → …

Conference

Conference	6th International Conference on Language Resources and Evaluation
Abbreviated title	LREC 2008
Country/Territory	Morocco
City	Marrakech
Period	1/05/08 → …

Keywords

automatic term recognition
ATR
semantic search
ontology learning

1 Chapter

Dialogue, speech and images: the companions project data set
Wilks, Y., Benyon, D., Brewster, C., Ircing, P. & Mival, O., Jun 2008, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08). 4 p.
Research output: Chapter in Book/Published conference output › Chapter

Cite this

@inbook{7396bae9bbd547d785e1d107b2377f41,

title = "A comparative evaluation of term recognition algorithms",

abstract = "Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.",

keywords = "automatic term recognition, ATR, semantic search, ontology learning",

author = "Ziqi Zhang and Jos{\'e} Iria and Christopher Brewster and Fabio Ciravegna",

year = "2008",

month = may,

language = "English",

pages = "2108--2111",

booktitle = "Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08)",

note = "6th International Conference on Language Resources and Evaluation, LREC 2008 ; Conference date: 01-05-2008",

}

TY - CHAP

T1 - A comparative evaluation of term recognition algorithms

AU - Zhang, Ziqi

AU - Iria, José

AU - Brewster, Christopher

AU - Ciravegna, Fabio

PY - 2008/5

Y1 - 2008/5

N2 - Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.

AB - Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.

KW - automatic term recognition

KW - ATR

KW - semantic search

KW - ontology learning

M3 - Chapter

SP - 2108

EP - 2111

BT - Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08)

T2 - 6th International Conference on Language Resources and Evaluation

Y2 - 1 May 2008

ER -

A comparative evaluation of term recognition algorithms

Abstract

Conference

Keywords

Fingerprint

Research output

Dialogue, speech and images: the companions project data set

Cite this