Semantic and Heuristic Based Approach for Paraphrase Identification

Muhidin A. Mohamed, Mourad Oussalah

Research output: Chapter in Book/Published conference outputConference publication

Abstract

In this paper, we propose a semantic-based paraphrase identification approach. The core concept of this proposal is to identify paraphrases when sentences contain a set of named-entities and common words. The developed approach distinguishes the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from the crowd-sourced knowledge in Wikipedia database. Besides, we improve WordNet similarity measure by nominalizing verbs, adjectives and adverbs with the aid of Categorial Variation database (CatVar). The paraphrase identification system is then evaluated using two different datasets; namely, Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. Experimental results on the aforementioned datasets show that our system outperforms baselines in the paraphrase identification task.

Original languageEnglish
Title of host publicationProceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018
PublisherIEEE
Pages203-210
Number of pages8
ISBN (Electronic)9781728104416
ISBN (Print)978-1-7281-0442-3
DOIs
Publication statusPublished - 30 Apr 2019
Event14th International Conference on Semantics, Knowledge and Grids, SKG 2018 - Guangzhou, China
Duration: 12 Sept 201814 Sept 2018

Publication series

Name2018 14th International Conference on Semantics, Knowledge and Grids (SKG)
PublisherIEEE
ISSN (Print)2325-0623

Conference

Conference14th International Conference on Semantics, Knowledge and Grids, SKG 2018
Country/TerritoryChina
CityGuangzhou
Period12/09/1814/09/18

Keywords

  • named-entity relatedness
  • Paraphrase identification
  • Sentence semantic similarity
  • Word category subsumption

Fingerprint

Dive into the research topics of 'Semantic and Heuristic Based Approach for Paraphrase Identification'. Together they form a unique fingerprint.

Cite this