Data visualization during the early stages of drug discovery

Dharmesh M. Maniyar; Ian T. Nabney; Bruce S. Williams; Andreas Sewing

doi:10.1021/ci050471a

Data visualization during the early stages of drug discovery

Dharmesh M. Maniyar^*, Ian T. Nabney, Bruce S. Williams, Andreas Sewing

^*Corresponding author for this work

Computer Science Research Group

Research output: Contribution to journal › Article › peer-review

Abstract

Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

Original language	English
Pages (from-to)	1806-1818
Number of pages	13
Journal	Journal of Chemical Information and Modeling
Volume	46
Issue number	4
DOIs	https://doi.org/10.1021/ci050471a
Publication status	Published - 2006

Keywords

segmentation
hierarchical visualisation
local predictive models
high-throughput screening

Access to Document

10.1021/ci050471a

Cite this

@article{2f14588b16c24903be0c5b72487ee010,

title = "Data visualization during the early stages of drug discovery",

abstract = "Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. {\textcopyright} 2006 American Chemical Society.",

keywords = "segmentation, hierarchical visualisation, local predictive models, high-throughput screening",

author = "Maniyar, {Dharmesh M.} and Nabney, {Ian T.} and Williams, {Bruce S.} and Andreas Sewing",

year = "2006",

doi = "10.1021/ci050471a",

language = "English",

volume = "46",

pages = "1806--1818",

journal = "Journal of Chemical Information and Modeling",

issn = "1549-9596",

publisher = "American Chemical Society",

number = "4",

}

TY - JOUR

T1 - Data visualization during the early stages of drug discovery

AU - Maniyar, Dharmesh M.

AU - Nabney, Ian T.

AU - Williams, Bruce S.

AU - Sewing, Andreas

PY - 2006

Y1 - 2006

N2 - Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

AB - Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

KW - segmentation

KW - hierarchical visualisation

KW - local predictive models

KW - high-throughput screening

UR - http://www.scopus.com/inward/record.url?scp=33746893423&partnerID=8YFLogxK

UR - http://pubs.acs.org/doi/full/10.1021/ci050471a

U2 - 10.1021/ci050471a

DO - 10.1021/ci050471a

M3 - Article

C2 - 16859312

SN - 1549-9596

VL - 46

SP - 1806

EP - 1818

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

IS - 4

ER -

Data visualization during the early stages of drug discovery

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this