Semi-supervised learning of statistical models for natural language understanding

Deyu Zhou; Yulan He

doi:10.1155/2014/121650

Semi-supervised learning of statistical models for natural language understanding

Deyu Zhou, Yulan He

Computer Science Research Group

Research output: Contribution to journal › Article › peer-review

Abstract

Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

Original language	English
Article number	121650
Number of pages	11
Journal	Scientific world journal
Volume	2014
DOIs	https://doi.org/10.1155/2014/121650
Publication status	Published - 20 Jul 2014

Bibliographical note

Copyright © 2014 Deyu Zhou and Yulan He. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Funding: National Natural Science Foundation of China (61103077), Ph.D. Programs Foundation of Ministry of Education of China for Young Faculties (20100092120031), Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, and the Fundamental Research Funds for the Central Universities (the Cultivation Program for Young Faculties of Southeast University).

Access to Document

10.1155/2014/121650

Semi-supervised learning of statistical models for natural language understanding
Copyright © 2014 Deyu Zhou and Yulan He. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Final published version, 2 MBLicence: CC BY 3.0

http://www.hindawi.com/journals/tswj/2014/121650/

Cite this

@article{674f23da270d4eb1b17c4833af08d888,

title = "Semi-supervised learning of statistical models for natural language understanding",

abstract = "Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure. ",

author = "Deyu Zhou and Yulan He",

note = "Copyright {\textcopyright} 2014 Deyu Zhou and Yulan He. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: National Natural Science Foundation of China (61103077), Ph.D. Programs Foundation of Ministry of Education of China for Young Faculties (20100092120031), Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, and the Fundamental Research Funds for the Central Universities (the Cultivation Program for Young Faculties of Southeast University).",

year = "2014",

month = jul,

day = "20",

doi = "10.1155/2014/121650",

language = "English",

volume = "2014",

journal = "Scientific world journal",

issn = "2356-6140",

publisher = "Hindawi Limited",

}

TY - JOUR

T1 - Semi-supervised learning of statistical models for natural language understanding

AU - Zhou, Deyu

AU - He, Yulan

N1 - Copyright © 2014 Deyu Zhou and Yulan He. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: National Natural Science Foundation of China (61103077), Ph.D. Programs Foundation of Ministry of Education of China for Young Faculties (20100092120031), Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, and the Fundamental Research Funds for the Central Universities (the Cultivation Program for Young Faculties of Southeast University).

PY - 2014/7/20

Y1 - 2014/7/20

N2 - Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

AB - Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

UR - http://www.scopus.com/inward/record.url?scp=84925884774&partnerID=8YFLogxK

U2 - 10.1155/2014/121650

DO - 10.1155/2014/121650

M3 - Article

C2 - 25152899

AN - SCOPUS:84925884774

SN - 2356-6140

VL - 2014

JO - Scientific world journal

JF - Scientific world journal

M1 - 121650

ER -

Semi-supervised learning of statistical models for natural language understanding

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this