PatMinr features four different state-of-the-art algorithms to extract all substring repetitions in a given sequence of characters. The four methods are the following:
- BiDESt is an adaptation of the BIDE method (Wang et al., 2003) for substring pattern mining.
- BiDEStOn is a new online version of BiDESt, specifically developed for PatMinr.
- UkkoClos constructs a suffix tree (Ukkonen, 1995) and selects the closed patterns from the tree.
- ClosUkko is a brand new approach that is faster than the three other methods in some cases. (Scientific paper currently under review)
Simply enter a sequence, select one of the four methods, and press "Analyze". The results is shown in the form of a "Closed pattern tree".
For instance, for "abracadabra", we obtain
"a
∟bra"
which means that "a" is a closed pattern, and "abra" is another closed pattern.
"bra", for instance, is not a closed pattern, because it is simply a suffix of "abra" repeated the same number of times.
For the text "abracadabra, banana", we obtain
"a
∟bra
∟na
b"
which means that the closed patterns are "a", "abra", "ana" and "b".
Scientific papers:
J. Wang, J. Han, and C. Li, “Frequent Closed Sequence Mining without Candidate Maintenance,” IEEE Trans. Knowl. Data Eng., Vol. 19, No. 8, 2007, pp. 1042–1056.
E. Ukkonen, “On-Line Construction of Suffix Trees”, Algorithmica, vol. 14, no. 3, 1995, pp. 249–260.
O. Lartillot, and D. Meredith, (paper currently under review).
点击按钮跳转至开发者官方下载地址...