A novel approach to approximate order preserving matching

Authors

  • Juan Carlos Mendivelso Moreno Universidad Nacional http://orcid.org/0000-0002-4141-7393 (unauthenticated)
  • Rafael Alberto Niquefa Velásquez Institución Universitaria Politécnico Grancolombiano
  • Yoán José Pinzón Ardila Universidad Pontificia Javeriana
  • Germán Jairo Hernández Pérez Universidad Nacional

DOI:

https://doi.org/10.22579/20112629.429

Keywords:

String searching, Experimental algorithm analysis, Strings similarity metric, String searching algorithms

Abstract

A problem with important applications in stock market analysis and music information retrieval is order-preserving matching. This problem is a recently introduced variant of the string matching problem that searches for substrings in the text whose natural representation matches the natural representation of the pattern. The natural representation of a string X is a string that contains the rankings of the characters occurring at each position of X. Then, order-preserving matching regards the internal structure of the strings rather than their absolute values. But both stock market analysis and music information retrieval require more flexibility: not only the substrings with exactly the same structure are of interest, but also the ones that are similar. In this paper, we propose an approximate version of order-preserving matching based on the δγ- distances that permit an individual error between the ranking of corresponding symbols (bounded by δ) and a global error of all the positions (bounded by γ). We present an algorithm that solves this problem in O(nm+m log m). Experimental results verify the efficiency of the proposed algorithm.

Downloads

Download data is not yet available.

References

Apostolico A, Galil Z. 1997. Pattern matching algorithms. Oxford University Press, USA.

Cambouropoulos E, Crochemore M, Iliopoulos C, Mouchard L, Pinzon Y. Algorithms for computing approximate repetitions in musical sequences. International Journal of Computer Mathematics, 2002;79(11):1135–1148.

Crochemore M, Iliopoulos C, Lecroq T, Plandowski W, Rytter W. 2002. Three Heuristics for δ-Matching, δ-BM Algorithms. Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching. Springer-Verlag London, UK,178–189.

Crochemore M, et al. Occurrence and Substring Heuristics for d-Matching. Fundamenta Informaticae. 2003;56(1):1–21.

Clifford R, Iliopoulos C. Approximate string matching for music analysis. Soft Computing, 2004;8(9):597–603.

Cantone D, Cristofaro S, Faro S. 2004. Efficient Algorithms for the δ- Approximate String Matching Problem in Musical Sequences. Proc. of the Prague Stringology Conference.

Crochemore M, Iliopoulos C, Navarro G, Pinzon Y, Salinger A. Bit-parallel (δ,γ)-Matching and Suffix Automata. Journal of Discrete Algorithms. 2005;3( 2-4):198–214.

Lee I, Clifford R, Kim S-R. 2006. Algorithms on extended (δ,γ)-matching. Computational Science and Its Applications-ICCSA. Springer. 1137–1142.

Lee I, Mendivelso J, Pinzon Y. 2008. δγ–parameterized matching. String Processing and Information Retrieval. Springer. 236–248.

Mendivelso J. 2010. Definition and solution of a new string searching variant termed δγ–parameterized matching. Master’s thesis. Universidad Nacional de Colombia.

Mendivelso J, Lee I, Pinzon Y. 2012. Approximate function matching under δ-and γ-distances. String Processing and Information Retrieval. Springer. 348–359.

Mendivelso J, Pino C, Niño L, Pinzon Y. Approximate abelian periods to find motifs in biological sequences. Lecture Notes in Bioinformatics, Computational Intelligence Methods for Bioinformatics and Biostatistics. 2015;8623:121-130.

Mendivelso J, Pinzon Y. 2014. A novel approach to approximate parikh matching for comparing composition in biological sequences. Proceedings of the 6th International Conference on Bioinformatics and Computational Biology.

Kim J, Eades P, Fleischer R, Hong S H, Iliopoulos C S, Park K, Puglisi S J, Tokuyama T. Order-preserving matching. Theoretical Computer Science. 2014;525:68–79.

Kubica M, Kulczynski T, Radoszewski J, Rytter W, Walen T. A linear time algorithm for consecutive permutation pattern matching. Information Processing Letters. 2013;113(12):430–433.

Crochemore M, Iliopoulos CS, Kociumaka T, Kubica M, et al. 2013. Order preserving suffix trees and their algorithmic applications. arXiv preprint arXiv:1303.6872.

Crochemore M, Iliopoulos CS, Kociumaka T, Kubica M, Langiu A, Pissis SP, Radoszewski J, Rytter W, Walen T. 2013a. Order-preserving incomplete suffix trees and order-preserving indexes. String Processing and Information Retrieval. Springer. 84–95.

Chhabra T, Tarhio J. 2014. Order-preserving matching with filtration. Experimental Algorithms. Springer. 307–314.

Faro S, Kulekci O. 2015. Efficient algorithms for the order preserving pattern matching problem. arXiv preprint arXiv:1501.04001.

Crochemore M, Iliopoulos CS, Kociumaka T, Kubica M, Langiu A, Pissis SP, Radoszewski J, Rytter W, Walen T. 2015. Order-preserving indexing. Theoretical Computer Science.

Hasan M M, Islam AS, Rahman MS, Rahman MS. Order preserving pattern matching revisited. Pattern Recognition Letters. 2015;55:15–21.

Chhabra T, Kulekci MO, Tarhio J. 2015. Alternative algorithms for order-preserving matching. Proceedings of the Prague Stringology Conference. 36–46.

Gawrychowski P, Uznanski P. 2015. Order-preserving pattern matching with k mismatches. Theoretical Computer Science.

Downloads

Published

2017-07-16

Issue

Section

Articles

How to Cite

A novel approach to approximate order preserving matching. (2017). Orinoquia, 21(1 Sup), 37-44. https://doi.org/10.22579/20112629.429

Similar Articles

1-10 of 49

You may also start an advanced similarity search for this article.