A framework for automated construction of resource space based on background knowledge

Xu Yu, Li Peng, Zhixing Huang*, Hai Zhuge

*Corresponding author for this work

Research output: Contribution to journalSpecial issue

Abstract

Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.

Original languageEnglish
Pages (from-to)222-231
Number of pages10
JournalFuture Generation Computer Systems
Volume32
Early online date5 Aug 2013
DOIs
Publication statusPublished - Mar 2014

Fingerprint

Semantics
Data structures

Keywords

  • latent Dirichlet allocation
  • resource space model
  • semantic graph
  • Wikipedia

Cite this

@article{c7199a249d95456c87d1ec9027a13e36,
title = "A framework for automated construction of resource space based on background knowledge",
abstract = "Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.",
keywords = "latent Dirichlet allocation, resource space model, semantic graph, Wikipedia",
author = "Xu Yu and Li Peng and Zhixing Huang and Hai Zhuge",
year = "2014",
month = "3",
doi = "10.1016/j.future.2013.07.017",
language = "English",
volume = "32",
pages = "222--231",
journal = "Future Generation Computer Systems",
issn = "0167-739X",
publisher = "Elsevier",

}

A framework for automated construction of resource space based on background knowledge. / Yu, Xu; Peng, Li; Huang, Zhixing; Zhuge, Hai.

In: Future Generation Computer Systems, Vol. 32, 03.2014, p. 222-231.

Research output: Contribution to journalSpecial issue

TY - JOUR

T1 - A framework for automated construction of resource space based on background knowledge

AU - Yu, Xu

AU - Peng, Li

AU - Huang, Zhixing

AU - Zhuge, Hai

PY - 2014/3

Y1 - 2014/3

N2 - Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.

AB - Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.

KW - latent Dirichlet allocation

KW - resource space model

KW - semantic graph

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=84891634777&partnerID=8YFLogxK

U2 - 10.1016/j.future.2013.07.017

DO - 10.1016/j.future.2013.07.017

M3 - Special issue

AN - SCOPUS:84891634777

VL - 32

SP - 222

EP - 231

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

SN - 0167-739X

ER -