Novel visualization methods for protein data

Shahzad Mumtaz, Ian Nabney, Darren Flower

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).
Original languageEnglish
Title of host publication2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)
PublisherIEEE
Pages198-205
Number of pages8
Publication statusPublished - 2012
Event2012 IEEE symposium on computational intelligence in bioinformatics and computational biology - San Diego, California, United States
Duration: 9 May 201212 May 2012

Conference

Conference2012 IEEE symposium on computational intelligence in bioinformatics and computational biology
CountryUnited States
CitySan Diego, California
Period9/05/1212/05/12
OtherThis symposium will bring together top researchers, practitioners, and students from around the world to discuss the latest advances in the field of Computational Intelligence and its application to real world problems in biology, bioinformatics, computational biology, chemical informatics, bioengineering and related fields. Computational Intelligence (CI) approaches include artificial neural networks and machine learning techniques, fuzzy logic, evolutionary algorithms and meta-heuristics, hybrid approaches and other emerging techniques.

Fingerprint

Visualization
Proteins
Principal component analysis
Electrostatics
Experiments

Bibliographical note

© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Cite this

Mumtaz, S., Nabney, I., & Flower, D. (2012). Novel visualization methods for protein data. In 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) (pp. 198-205). IEEE.
Mumtaz, Shahzad ; Nabney, Ian ; Flower, Darren. / Novel visualization methods for protein data. 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, 2012. pp. 198-205
@inproceedings{962b00bcdb684f7c9006c9e06d02f7db,
title = "Novel visualization methods for protein data",
abstract = "Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).",
author = "Shahzad Mumtaz and Ian Nabney and Darren Flower",
note = "{\circledC} 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.",
year = "2012",
language = "English",
pages = "198--205",
booktitle = "2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)",
publisher = "IEEE",
address = "United States",

}

Mumtaz, S, Nabney, I & Flower, D 2012, Novel visualization methods for protein data. in 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, pp. 198-205, 2012 IEEE symposium on computational intelligence in bioinformatics and computational biology, San Diego, California, United States, 9/05/12.

Novel visualization methods for protein data. / Mumtaz, Shahzad; Nabney, Ian; Flower, Darren.

2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, 2012. p. 198-205.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Novel visualization methods for protein data

AU - Mumtaz, Shahzad

AU - Nabney, Ian

AU - Flower, Darren

N1 - © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

PY - 2012

Y1 - 2012

N2 - Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).

AB - Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).

UR - http://www.scopus.com/inward/record.url?scp=84864071478&partnerID=8YFLogxK

M3 - Conference contribution

SP - 198

EP - 205

BT - 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

PB - IEEE

ER -

Mumtaz S, Nabney I, Flower D. Novel visualization methods for protein data. In 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE. 2012. p. 198-205