VaxiJen Dataset of Bacterial Immunogens: An Update

Nevena Zaharieva, Ivan Dimitrov, Darren R. Flower, Irini Doytchinova

Research output: Contribution to journalArticle

Abstract

Background: Identifying immunogenic proteins is the first stage in vaccine design and development. VaxiJen is the most widely used and highly cited server for immunogenicity prediction. As the developers of VaxiJen, we are obliged to update and improve it regularly. Here, we present an updated dataset of bacterial immunogens containing 317 experimentally proven immunogenic proteins of bacterial origin, of which 60% have been reported during the last 10 years.

Methods: PubMed was searched for papers containing data for novel immunogenic proteins tested on humans till March 2017. Corresponding protein sequences were collected from NCBI and UniProtKB. The set was curated manually for multiple protein fragments, isoforms, and duplicates.
Results: The final curated dataset consists of 306 immunogenic proteins tested on humans derived from 47 bacterial microorganisms. Certain proteins have several isoforms. All were considered, and the total protein sequences in the set are 317. The updated set contains 206 new immunogens, compared to the previous VaxiJen bacterial dataset. The average number of immunogens per species is 6.7. The set also contains 12 fusion proteins and 41 peptide fragments and epitopes. The dataset includes the names of bacterial microorganisms, protein names, and protein sequences in FASTA format.
Conclusion: Currently, the updated VaxiJen bacterial dataset is the best known manually-curated compilation of bacterial immunogens. It is freely available at http://www.ddg-pharmfac.net/vaxijen/dataset. It can easily be downloaded, searched, and processed. When combined with an appropriate negative dataset, this update could also serve as a training set, allowing enhanced prediction of the potential immunogenicity of unknown protein sequences.
Original languageEnglish
Pages (from-to)398-400
Number of pages3
JournalCurrent Computer-Aided Drug design
Volume15
Issue number5
Early online date18 Mar 2019
DOIs
Publication statusPublished - 1 Oct 2019

Fingerprint

Proteins
Bacterial Proteins
Names
Protein Isoforms
Datasets
Peptide Fragments
PubMed
Epitopes
Vaccines

Keywords

  • FASTA
  • Immunogenicity prediction
  • VaxiJen
  • bacterial immunogen
  • dataset
  • epitopes.

Cite this

Zaharieva, N., Dimitrov, I., Flower, D. R., & Doytchinova, I. (2019). VaxiJen Dataset of Bacterial Immunogens: An Update. Current Computer-Aided Drug design, 15(5), 398-400. https://doi.org/10.2174/1573409915666190318121838
Zaharieva, Nevena ; Dimitrov, Ivan ; Flower, Darren R. ; Doytchinova, Irini. / VaxiJen Dataset of Bacterial Immunogens: An Update. In: Current Computer-Aided Drug design. 2019 ; Vol. 15, No. 5. pp. 398-400.
@article{05e6cbe9aaba4f2fb717ff39a329f237,
title = "VaxiJen Dataset of Bacterial Immunogens: An Update",
abstract = "Background: Identifying immunogenic proteins is the first stage in vaccine design and development. VaxiJen is the most widely used and highly cited server for immunogenicity prediction. As the developers of VaxiJen, we are obliged to update and improve it regularly. Here, we present an updated dataset of bacterial immunogens containing 317 experimentally proven immunogenic proteins of bacterial origin, of which 60{\%} have been reported during the last 10 years.Methods: PubMed was searched for papers containing data for novel immunogenic proteins tested on humans till March 2017. Corresponding protein sequences were collected from NCBI and UniProtKB. The set was curated manually for multiple protein fragments, isoforms, and duplicates.Results: The final curated dataset consists of 306 immunogenic proteins tested on humans derived from 47 bacterial microorganisms. Certain proteins have several isoforms. All were considered, and the total protein sequences in the set are 317. The updated set contains 206 new immunogens, compared to the previous VaxiJen bacterial dataset. The average number of immunogens per species is 6.7. The set also contains 12 fusion proteins and 41 peptide fragments and epitopes. The dataset includes the names of bacterial microorganisms, protein names, and protein sequences in FASTA format.Conclusion: Currently, the updated VaxiJen bacterial dataset is the best known manually-curated compilation of bacterial immunogens. It is freely available at http://www.ddg-pharmfac.net/vaxijen/dataset. It can easily be downloaded, searched, and processed. When combined with an appropriate negative dataset, this update could also serve as a training set, allowing enhanced prediction of the potential immunogenicity of unknown protein sequences.",
keywords = "FASTA, Immunogenicity prediction, VaxiJen, bacterial immunogen, dataset, epitopes.",
author = "Nevena Zaharieva and Ivan Dimitrov and Flower, {Darren R.} and Irini Doytchinova",
year = "2019",
month = "10",
day = "1",
doi = "10.2174/1573409915666190318121838",
language = "English",
volume = "15",
pages = "398--400",
journal = "Current Computer-Aided Drug design",
issn = "1573-4099",
publisher = "Bentham Science Publishers B.V.",
number = "5",

}

Zaharieva, N, Dimitrov, I, Flower, DR & Doytchinova, I 2019, 'VaxiJen Dataset of Bacterial Immunogens: An Update', Current Computer-Aided Drug design, vol. 15, no. 5, pp. 398-400. https://doi.org/10.2174/1573409915666190318121838

VaxiJen Dataset of Bacterial Immunogens: An Update. / Zaharieva, Nevena; Dimitrov, Ivan; Flower, Darren R.; Doytchinova, Irini.

In: Current Computer-Aided Drug design, Vol. 15, No. 5, 01.10.2019, p. 398-400.

Research output: Contribution to journalArticle

TY - JOUR

T1 - VaxiJen Dataset of Bacterial Immunogens: An Update

AU - Zaharieva, Nevena

AU - Dimitrov, Ivan

AU - Flower, Darren R.

AU - Doytchinova, Irini

PY - 2019/10/1

Y1 - 2019/10/1

N2 - Background: Identifying immunogenic proteins is the first stage in vaccine design and development. VaxiJen is the most widely used and highly cited server for immunogenicity prediction. As the developers of VaxiJen, we are obliged to update and improve it regularly. Here, we present an updated dataset of bacterial immunogens containing 317 experimentally proven immunogenic proteins of bacterial origin, of which 60% have been reported during the last 10 years.Methods: PubMed was searched for papers containing data for novel immunogenic proteins tested on humans till March 2017. Corresponding protein sequences were collected from NCBI and UniProtKB. The set was curated manually for multiple protein fragments, isoforms, and duplicates.Results: The final curated dataset consists of 306 immunogenic proteins tested on humans derived from 47 bacterial microorganisms. Certain proteins have several isoforms. All were considered, and the total protein sequences in the set are 317. The updated set contains 206 new immunogens, compared to the previous VaxiJen bacterial dataset. The average number of immunogens per species is 6.7. The set also contains 12 fusion proteins and 41 peptide fragments and epitopes. The dataset includes the names of bacterial microorganisms, protein names, and protein sequences in FASTA format.Conclusion: Currently, the updated VaxiJen bacterial dataset is the best known manually-curated compilation of bacterial immunogens. It is freely available at http://www.ddg-pharmfac.net/vaxijen/dataset. It can easily be downloaded, searched, and processed. When combined with an appropriate negative dataset, this update could also serve as a training set, allowing enhanced prediction of the potential immunogenicity of unknown protein sequences.

AB - Background: Identifying immunogenic proteins is the first stage in vaccine design and development. VaxiJen is the most widely used and highly cited server for immunogenicity prediction. As the developers of VaxiJen, we are obliged to update and improve it regularly. Here, we present an updated dataset of bacterial immunogens containing 317 experimentally proven immunogenic proteins of bacterial origin, of which 60% have been reported during the last 10 years.Methods: PubMed was searched for papers containing data for novel immunogenic proteins tested on humans till March 2017. Corresponding protein sequences were collected from NCBI and UniProtKB. The set was curated manually for multiple protein fragments, isoforms, and duplicates.Results: The final curated dataset consists of 306 immunogenic proteins tested on humans derived from 47 bacterial microorganisms. Certain proteins have several isoforms. All were considered, and the total protein sequences in the set are 317. The updated set contains 206 new immunogens, compared to the previous VaxiJen bacterial dataset. The average number of immunogens per species is 6.7. The set also contains 12 fusion proteins and 41 peptide fragments and epitopes. The dataset includes the names of bacterial microorganisms, protein names, and protein sequences in FASTA format.Conclusion: Currently, the updated VaxiJen bacterial dataset is the best known manually-curated compilation of bacterial immunogens. It is freely available at http://www.ddg-pharmfac.net/vaxijen/dataset. It can easily be downloaded, searched, and processed. When combined with an appropriate negative dataset, this update could also serve as a training set, allowing enhanced prediction of the potential immunogenicity of unknown protein sequences.

KW - FASTA

KW - Immunogenicity prediction

KW - VaxiJen

KW - bacterial immunogen

KW - dataset

KW - epitopes.

UR - http://www.eurekaselect.com/170795/article

UR - http://www.scopus.com/inward/record.url?scp=85073576959&partnerID=8YFLogxK

U2 - 10.2174/1573409915666190318121838

DO - 10.2174/1573409915666190318121838

M3 - Article

VL - 15

SP - 398

EP - 400

JO - Current Computer-Aided Drug design

JF - Current Computer-Aided Drug design

SN - 1573-4099

IS - 5

ER -