Conserved structural entities with distinctive secondary structure content and an hydrophobic core. In small disulphide-rich and Zn2+-binding or Ca2+- binding domains the hydrophobic core may be provided by cystines and metal ions, respectively. Homologous domains with common functions usually show sequence similarities.
Alignment scores are reported by HMMer and BLAST as bits scores. The likelihood that the query sequence is a bona fide homologue of the database sequence is compared to the likelihood that the sequence was instead generated by a “random” model. Taking the logarithm (to base 2) of this likelihood ratio gives the bits score.
This represents a probability that, given a database of a particular size, random sequences score higher than a value X. P-values are generated by the BLAST algorithm that has been integrated into SMART.
This represents the number of sequences with a score greater-than, or equal to, X, expected absolutely by chance. The E-value connects the score (“X”) of an alignment between a user-supplied sequence and a database sequence, generated by any algorithm, with how many alignments with similar or greater scores that would be expected from a search of a random sequence database of equivalent size. Since version 2.0 E-values are calculated using Hidden Markov Models, leading to more accurate estimates than before.
Sequence motifs are short conserved regions of polypeptides. Sets of sequence motifs need not necessarily represent homologues.
A profile is a table of position-specific scores and gap penalties, representing an homologous family, that may be used to search sequence databases (Ref.: , , ). In CLUSTAL-W-derived profiles those sequences that are more distantly related are assigned higher weights (, , ). Issues in profile-based database searching are discussed in Bork & Gibson (1996) .