Most operations of SAPRED website is easy. Neccesary explanations and links are added in the job submition page and result page. The users can go through all the essential operations without login. But we still recommend the users register an free account and login to have a try. After login, the user can manage their data in the server, query the history jobs and so on. Each user has a seperate user space which cannot be accessed by others. Several terms involved in SAPRED and the operations on the remote files and directories in the user's work space are described below.
"A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than ('>') symbol in the first column (ncbi)." For more information about fasta format, refer to ncbi fasta format description.
SAP stands for single amino acid polymorphism and SNP for single nucleotide polymorphism. The nsSNP is the non-synonymous SNP which causes the amino acid subsitution in the protein product. If we discard the population frequency information, SAP, nsSNP and missense mutation are Synonyms.
We use the form of A###B to represent an amino acid substitution on a specified protein. The A and B are the single amino acid codes while ### stands for a postion at the specified protein sequence. The numbering system of the positions starts from 1. For example, Y128H describes that the 128th residue of the protein, which is TYR (Y) orginally, is replaced by HIS (H). More details about the mutation nomenclature are found here.
We adopted the "-b 1" parameter in LIBSVM to generate the probability estimation, which is used as "Likelihood" here. The score serves as the prediction confidence, i.e. the DISEASE_LIKELIHOOD of 0.9 means the prediction is of high confidence, while a value of nearly 0.5 means that prediction lacks confidence.
PDB (PROTEIN DATA BANK) and Protein Structure
PDB is a well-known database depositing macromolecule structural data. Each entry contains the 3D coordinates of a specific molecule or complex in atom detail. They have two types of file formats: PDB and PDBML. The former is more popular while the latter is more suitable for computer programs. The structure files deposited in PDB are from X-ray crystallography or NMR experiments. Due to the technical difficulties, these kinds of data are far less than the protein sequence data. Fortunately, researchers can predict the protein structure by homology modelling (comparative modelling) based on the alignment between target (the protein without structure) and template (the protein with structure available). Homology modelling requires that the sequence identity in the alignment is over 35%. The higher the identity, the more accurate the modeled structures. Two popular modelling software is Swiss-Model and Modeller. The Swiss-Prot variant pages collect the protein variants information comprehensively. And furthermore, if these protein variants can be aligned to a protein with known structure and the sequence identiy exceeds 70%, their stuctures are modeled and deposited in ModSNP database. Since the sequence identity is over 70%, the modeled structures should be of high quality. The dataset we used is from ModSNP. The Swiss-Prot variant pages can be accessed at Expasy niceprot view.
The document about how to use the user work space
Make A New Directory
Upload A File
|Copyright© 2006-2007, CBI All Rights Reserved.|