Data visualization during the early stages of drug discovery

Dharmesh M. Maniyar, Ian T. Nabney, Bruce S. Williams, Andreas Sewing

Research output: Contribution to journalArticle

Abstract

Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

Original languageEnglish
Pages (from-to)1806-1818
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume46
Issue number4
DOIs
Publication statusPublished - 2006

Fingerprint

Data visualization
visualization
Visualization
drug
expert
Self organizing maps
Bioactivity
Principal component analysis
Screening
chemist
Drug Discovery
projection
paradigm
Pharmaceutical Preparations
efficiency
Industry
industry
experiment
Experiments

Keywords

  • segmentation
  • hierarchical visualisation
  • local predictive models
  • high-throughput screening

Cite this

Maniyar, Dharmesh M. ; Nabney, Ian T. ; Williams, Bruce S. ; Sewing, Andreas. / Data visualization during the early stages of drug discovery. In: Journal of Chemical Information and Modeling. 2006 ; Vol. 46, No. 4. pp. 1806-1818.
@article{2f14588b16c24903be0c5b72487ee010,
title = "Data visualization during the early stages of drug discovery",
abstract = "Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. {\circledC} 2006 American Chemical Society.",
keywords = "segmentation, hierarchical visualisation, local predictive models, high-throughput screening",
author = "Maniyar, {Dharmesh M.} and Nabney, {Ian T.} and Williams, {Bruce S.} and Andreas Sewing",
year = "2006",
doi = "10.1021/ci050471a",
language = "English",
volume = "46",
pages = "1806--1818",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "4",

}

Maniyar, DM, Nabney, IT, Williams, BS & Sewing, A 2006, 'Data visualization during the early stages of drug discovery', Journal of Chemical Information and Modeling, vol. 46, no. 4, pp. 1806-1818. https://doi.org/10.1021/ci050471a

Data visualization during the early stages of drug discovery. / Maniyar, Dharmesh M.; Nabney, Ian T.; Williams, Bruce S.; Sewing, Andreas.

In: Journal of Chemical Information and Modeling, Vol. 46, No. 4, 2006, p. 1806-1818.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Data visualization during the early stages of drug discovery

AU - Maniyar, Dharmesh M.

AU - Nabney, Ian T.

AU - Williams, Bruce S.

AU - Sewing, Andreas

PY - 2006

Y1 - 2006

N2 - Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

AB - Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

KW - segmentation

KW - hierarchical visualisation

KW - local predictive models

KW - high-throughput screening

UR - http://www.scopus.com/inward/record.url?scp=33746893423&partnerID=8YFLogxK

UR - http://pubs.acs.org/doi/full/10.1021/ci050471a

U2 - 10.1021/ci050471a

DO - 10.1021/ci050471a

M3 - Article

VL - 46

SP - 1806

EP - 1818

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 4

ER -