A pattern filtering approach is developed to analyze genomic sequences in this work.
With this approach, the distance of a certain pattern is first translated into a "gap
sequence" consisting of integer numbers. Different patterns result in different gap
sequences, and the similarity measure of two genomic sequences can be made based upon
the processing of gap sequences generated by a set of pre-selected patterns. A matched
filtering approach is applied to gap sequences. Furthermore, several post-processing
techniques are applied to the filtered result for signal enhancement. For example,
the modified Butterworth window (MBW) is used to remove the edge effect of the matched
filter output, and the uncertain region is beleaguered by the advanced similarity
test (AST) algorithm. The match between gap sequences is called a "frame match". The
actual match of two genomic sequences demands both frame match and stuffing match.
The proposed approach is useful for sequence analysis based on the frame match with
desirable patterns. Extensive experimental results are presented to demonstrate the
performance of the proposed method.