A Dynamic Programming Approach for Computing Similarity of the Protein Sequences Based on Continuous Functions Comparison
Abstract
This paper introduces a dynamic programming approach for computing "continuous" similarity of the two protein sequences. The discrete dynamic programming method considers items of each comparable sequence independently; meantime there is a strong interrelation between them. To overcome this disadvantage a "continuous" sequence comparison method is developed. Particularly, a certain continuous function is correlated to each comparable protein sequence, and then the comparison is made between those functions. Through compressions and expansions the comparable functions are brought to the most similar representation in the meaning of a certain similarity function. By this approach the sequence comparison problem is reduced to a functional maximization problem, which is numerically solved using dynamic programming method. Finally some practical results are presented with the application of described method.
References
Alexeev V.M., Tikhomirov V.M. and Fomin S.V., Optimal Control., Nauka, 1979.
Alstchul S.F., Glish W., Miller W., Myers E.W. and Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403-410, 1990.
Bairoch A. and Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acid Res., 27, 49-54, 1999.
Baldi P. and Brunak S. Bioinformatics, The Machine Learning Apprach. MIT Press, 2001.
Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R., Griffiths-Jones S., Howe K.L., Marshall M. and Sonnhammer L.L.E. The Pfam protein families database. Nucleic Acids Research, vol. 30, no. 1, 276-280, 2002.
Bellman R. Dynamic Programming. Prinston Univ. Press, 1957.
Durbin R. Eddy S.R., Krogh A., Mitchison G. Biological Sequence analysis. Cambridge University Press, 1998.
Eddy S.R. Profile hidden Markov models. Bioinformatics, vol. 14, no. 9, 755-763, 1998.
Gevorgyan R.K., A Continuous Method for Evaluating Dissimilarity of the Protein Sequences. Proceedings of the ISAAC Conference on Analysis, Yerevan, Armenia, 29-40, 2004.
Gusfield D. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Heymann S., Gabrielyan O. R., Ghazaryan H. and others. Towards a Metrical Space of Biological Sequences. Proceedings of the ISAAC Conference on Analysis, Y erevan, Armenia, 1-18, 2004.
Horst R., Pardalos P.M. and Thoai N.V. Introduction to global optimization. Kluwer Academic Publishers, 1995.
Kantorovich L.V. and Rubinstein G.S. On a function space and certain extremum problem. Dokl.Akad. Nauk SSSR, N 5, 115, 1058-1061, 1957.
Levenstein V.I. Binary codes capable of correcting insertions and reversals. Sov. Phys. Dokl., 10:707-710, 1966.
Mikhalevich V.S. Sequential optimization algorithms and their applications., Kibernetika, N 1, 2, 1965.
Moiseev N. N. Calculus of approximations in the theory of optimal tasks. Nauka, Moscow, 1971.
Needelman S.B. and Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol., 48, 443-453, 1970.
Pearson W.R. and Lipman D.J. Improved tools for biological sequence comparison. Proc. Nat. Acad Sci. USA, 85, 2444-2448, 1988.
Rabiner R. and Juang B.-H. Fundamentals of speech recognition. Prentice Hall PTR Englewood Cliffs, New Jersey 07632, 1993.
Sankoff D. and Kruskal J.B. Time Warps, String Edits and Macromolecules. CSLI Publications, 1997, ISBN 1-57586-21 7-4.
Setubal J.C. and Meidanis J. Introduction to computational molecular biology. PWS Publishing company, 1997.
Smith T.F. and Waterman M.S. Identification of common molecular subsequences. Journal of Molecular Biology, 147: 195-197, 1981.
Wasserstein L.N. Markov processes over denumerable products of spaces describing large systems of automata. Problems of Information Transission 5, 47-52, 1969.
Waterman M. Introduction to computational biology. Chapman and Hall, 1995.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.