Correct prediction of the structure of protein-coding genes of higher eukaryotes is
a difficult task therefore public sequence databases incorporating predicted sequences
are increasingly contaminated with erroneous sequences. The high rate of misprediction
has serious consequences since it significantly affects the conclusions that may be
drawn from genome-scale sequence analyses.Here we describe the MisPred and FixPred
approaches that may help the identification and correction of erroneous sequences.
The rationale of these approaches is that a protein sequence is likely to be erroneous
if some of its features conflict with our current knowledge about proteins.