Example Program
Maximal Unique Matches
Example for using the Mums Iterator.
Given a set of sequences, a unique match is a match that occurs exactly once in each sequence.
A maximal unique match (MUM) is a unique match that is not part of any longer unique match. The following
example demonstrates how to iterate over all MUMs and output them.
File "index_mums.cpp"
A tutorial about finding Mums.
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 |
We begin with a StringSet that stores multiple strings.
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 |
| 14 | |
| 15 | |
| 16 |
To find maximal unique matches (Mums), we use the Mums Iterator
and set the minimum MUM length to 3.
| 17 | |
| 18 | |
| 19 | |
| 20 |
A multiple match can be represented by the positions it occurs at in every sequence
and its length. getOccurrences returns an unordered sequence of pairs
(seqNo,seqOfs) the match occurs at.
| 21 |
To order them ascending according seqNo we use orderOccurrences.
| 22 | |
| 23 | |
| 24 | |
| 25 | |
| 26 |
repLength returns the length of the match.
| 27 | |
| 28 |
The match string itself can be determined with representative.
| 29 | |
| 30 | |
| 31 | |
| 32 | |
| 33 | |
| 34 | |
| 35 |
Output
The only maximal matches that occur in all 3 sequences are "SeqAn" and "sequence" .
They occur exactly once and thus are maximal unique matches.
weese@tanne:~/seqan/demos$ make index_mums
weese@tanne:~/seqan/demos$ ./index_mums
0, 53, 33, 5 "SeqAn"
23, 36, 3, 8 "sequence"
weese@tanne:~/seqan/demos$
See Also
SeqAn - Sequence Analysis Library - www.seqan.de