Semisupervised learning of hierarchical latent trait models for data visualization

Ian T. Nabney; Yi Sun; Peter Tiňo; Ata Kabán

doi:10.1109/TKDE.2005.49

Semisupervised learning of hierarchical latent trait models for data visualization

Ian T. Nabney, Yi Sun, Peter Tiňo, Ata Kabán

Computer Science Research Group

Research output: Contribution to journal › Article › peer-review

Abstract

Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.

Original language	English
Pages (from-to)	384-400
Number of pages	17
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	17
Issue number	3
DOIs	https://doi.org/10.1109/TKDE.2005.49
Publication status	Published - Mar 2005

Bibliographical note

© 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Access to Document

10.1109/TKDE.2005.49

Semisupervised learning of hierarchical latent trait models for data visualization
© 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 9.19 MB

Cite this

@article{3a9881bcfde94d65a9a1e1e8508b59fd,

title = "Semisupervised learning of hierarchical latent trait models for data visualization",

abstract = "Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects {"}regions of interest,{"} whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.",

author = "Nabney, {Ian T.} and Yi Sun and Peter Ti{\v n}o and Ata Kab{\'a}n",

note = "{\textcopyright} 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works",

year = "2005",

month = mar,

doi = "10.1109/TKDE.2005.49",

language = "English",

volume = "17",

pages = "384--400",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE",

number = "3",

}

TY - JOUR

T1 - Semisupervised learning of hierarchical latent trait models for data visualization

AU - Nabney, Ian T.

AU - Sun, Yi

AU - Tiňo, Peter

AU - Kabán, Ata

N1 - © 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

PY - 2005/3

Y1 - 2005/3

N2 - Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.

AB - Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.

UR - http://www.scopus.com/inward/record.url?scp=14644433067&partnerID=8YFLogxK

UR - http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1388248

U2 - 10.1109/TKDE.2005.49

DO - 10.1109/TKDE.2005.49

M3 - Article

AN - SCOPUS:14644433067

SN - 1041-4347

VL - 17

SP - 384

EP - 400

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 3

ER -

Semisupervised learning of hierarchical latent trait models for data visualization

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this