Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Muhidin Mohamed, Mourad Oussalah

Research output: Contribution to journalArticle

Abstract

An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.
Original languageEnglish
JournalInternational Journal of Advanced Computer Science and Applications
Volume5
Issue number7
DOIs
Publication statusPublished - 1 Jul 2014

Fingerprint

Classifiers
Experiments

Bibliographical note

This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Cite this

Mohamed, M., & Oussalah, M. (2014). Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes. International Journal of Advanced Computer Science and Applications, 5(7). https://doi.org/10.14569/IJACSA.2014.050725
Mohamed, Muhidin ; Oussalah, Mourad. / Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes. In: International Journal of Advanced Computer Science and Applications. 2014 ; Vol. 5, No. 7.
@article{04e313a5d6fa4053b228f1e0312cf695,
title = "Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes",
abstract = "An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97{\%}. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.",
author = "Muhidin Mohamed and Mourad Oussalah",
note = "This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.",
year = "2014",
month = "7",
day = "1",
doi = "10.14569/IJACSA.2014.050725",
language = "English",
volume = "5",
number = "7",

}

Mohamed, M & Oussalah, M 2014, 'Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes', International Journal of Advanced Computer Science and Applications, vol. 5, no. 7. https://doi.org/10.14569/IJACSA.2014.050725

Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes. / Mohamed, Muhidin; Oussalah, Mourad.

In: International Journal of Advanced Computer Science and Applications, Vol. 5, No. 7, 01.07.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

AU - Mohamed, Muhidin

AU - Oussalah, Mourad

N1 - This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

PY - 2014/7/1

Y1 - 2014/7/1

N2 - An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.

AB - An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.

UR - https://thesai.org/Publications/ViewPaper?Volume=5&Issue=7&Code=ijacsa&SerialNo=25

U2 - 10.14569/IJACSA.2014.050725

DO - 10.14569/IJACSA.2014.050725

M3 - Article

VL - 5

IS - 7

ER -

Mohamed M, Oussalah M. Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes. International Journal of Advanced Computer Science and Applications. 2014 Jul 1;5(7). https://doi.org/10.14569/IJACSA.2014.050725