Single Amino acid Polymorphisms (SAPs), also known as non-synonymous Single Nucleotide Polymorphisms (nsSNPs), account for about 50% of the gene lesions known to be related to inherited diseases. Through large-scale efforts such as HapMap project and The Cancer Genome Atlas (TCGA),available SAP data is accumulating rapidly in databases such as dbSNP, HGVBase, Swiss-Prot variant page and many allele-specific databases. This provides us the opportunities and needs to understand and predict their disease-association.


SAPRED, the SAP disease-association predictor, offers the researchers an automatic pipeline to predict the disease-association of SAPs. Compared with other similar tools, SAPRED utilizes several novel attributes such as Structural Neighbor Profile and Nearby Functional Sites, in addition to incorporating other well-known attributes such as Residue Frequency and Conservation. By feeding these attributes to the internal trained SVM classifier, SAPRED outputs the final prediction result as well as the corresponding likelihood. The attributes themselves are also presented due to their potential biological significance.


Currently SAPRED affords two types of predictions. One is based on both the structural and sequence information, the other relies on the sequence information only. The former aims at higher prediction accuracy and more attributes with putative biological insights, while the latter covers much more inputs whose structural models are not available at present.


The sequence-based SAPRED requires only a protein sequence in FASTA format and a valid mutation name of the SAP as input. If you can provide two homology-modeled structure files in PDB format of the wildtype and variant protein, you can choose the SAPRED with structural information support. The structure models should be of high-quality. For example, we used in our dataset the structure files from ModSNP, where all the structures are modeled based on the target-template alignment with over 70% identity, and thus should have enough quality. The structure model can be prepared using Swiss-Model or Modeller.

Citation:

Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP)
Zhi-Qiang Ye; Shu-Qi Zhao; Ge Gao; Xiao-Qiao Liu; Robert E. Langlois; Hui Lu; Liping Wei
Bioinformatics 2007 23(12):1444-1450; doi:10.1093/bioinformatics/btm119
Abstract, Free Full Text, Free PDF

The SAPRED system is under ongoing development. We appreciate your advices and bug reports.

 

Copyright© 2006-2007, CBI All Rights Reserved.