Computational Biology and Bioinformatics
Volume 4, Issue 2, April 2016, Pages: 15-20
Received: May 3, 2016;
Accepted: May 16, 2016;
Published: May 28, 2016
Views 3610 Downloads 153
Saeed Alirezanejad Gohardani, Department of Applied Mathematics, Faculty of Mathematical Science, University of Guilan, Rasht, Iran
Mehri Bagherian, Department of Applied Mathematics, Faculty of Mathematical Science, University of Guilan, Rasht, Iran
Hamid Reza Vaziri, Department of Biology, Faculty of Science, University of Guilan, Rasht, Iran
Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements.
Saeed Alirezanejad Gohardani,
Hamid Reza Vaziri,
A New Idea for Improving the Running Time of PMS Algorithm, Computational Biology and Bioinformatics.
Vol. 4, No. 2,
2016, pp. 15-20.
D. Laurent and B. Philipp, "Searching for regulatory elements in human noncoding sequences," Current Opinion in Structural Biology, vol. 7, pp. 399-406, 1997.
D. S. a. D. I. Ratne, "Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins," PLOS one, pp. 1-18, 2016.
S. Rajasekaran, "Computational techniques for motif search," Frontiers in Bioscience, no. 14, pp. 5052-5065, 2009.
M. Frances and A. Litman, "On Covering Problems of Codes," Theory of Computing Systems, vol. 30, no. 2, pp. 113-119, 1997.
S. Rajasekaran, S. Balla and C. Huang, "EXACT ALGORITHMS FOR PLANTED MOTIF CHALLENGE PROBLEMS," APBC, pp. p249-259, 2005.
J. Davila, S. Balla and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted," Proc. Second Int Workshop Bioinformatics Research and Applications, May 2006.
S. Rajasekaran and H. Dinh, "A Speedup Technique for (l, d) Motif Finding Algorithms," BMC Research Notes, vol. 4, no. 54, Mar. 2011.
P. Kuksa and V. Pavlovic, "Efficient Motif Finding Algorithms for Large-Alphabet Inputs," BMC Bioinformatics, vol. 11, May 2010.
H. Dinh, S. Rajasekaran and V. Kundeti, "PMS5: An Efficient Exact Algorithm for the (l, d) Motif Finding Problem," BMC Bioinformatics, vol. 12, Oct. 2011.
S. Bandyopadhyay, S. Sahni and S. Rajasekaran, "PMS6: A Fast Algorithm for Motif Discovery," in Proc. IEEE Second Int’l Conf. Computational Advances in Bio and Medical Sciences (ICCABS ’12), Feb. 2012.
M. N. a. S. Rajasekaran, "Efficient sequential and parallel algorithms for planted motif search," BMC Bioinformatics, pp. DOI: 10.1186/1471-2105-15-34, 2014.
M. N. a. S. Rajasekaran, "qPMS9: An Efficient Algorithm for Quorum Planted Motif Search," Scientific Reports, p. doi: 10.1038/srep07813, 2015.
Davila, J.; Balla, S.; Rajasekaran, S., "Fast and Practical Algorithms for Planted (l,d) Motif Search," IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, 2007.
F. Chin and H. Leung, "Voting Algorithms for discovering long motifs," in Proceedings of the Third Asia-Pacific Bioinformatics Conference (APBC2005), Singapore, pages 261-271, 2005.
N. Pisanti, A. Carvalho, L. Marsan and M. Sagot, "Risotto: Fast extraction of motifs with mismatches," in Proceedings of the 7th Latin American Theoretical Informatics Symposium, 2006.
T. Baily and C. Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers," in Proceedings of Second International Conference on Intelligent Systems for Molecular Biology, 1994.
J. Buhler and M. Tompa, "Finding motifs using random projections," Proceedings of Fifth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 69-76, 2001.
C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald and e. al, "Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment," Science, vol. 262, pp. 208-214, 1993.
P. Pevzner and S. Sze, "Combinatorial approaches to finding subtle signals in dna sequences," Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269-278, 2000.
E. Rocke and M. Tompa, "An algorithm for finding novel gapped motifs in dna sequences," Proceedings of Second International Conference on Computational Molecular Biology (RECOMB), pp. 228-233, 1998.
U. Keich and P. Pevzner, "Finding motifs in the twilight zone," Bioinformatics, vol. 18, pp. 1374-1381, 2002.
A. Price, S. Ramabhadran and P. Pevzner, "Finding subtle motifs by branching from sample strings," Bioinformatics, vol. 1, pp. 1-7, 2003.
G. Hertz and G. Stormo, "Identifying dna and protein patterns with statistically significant alignments of multiple sequences," Bioinformatics, vol. 15, pp. 563-577, 1999.
A. M. J. L. A. a. A. K. Joan Serr, "A Genetic Algorithm to Discover Flexible Motifs with Support," arXiv:1511.04986v2 [cs.LG], pp. 1-9, 2016.
M. Blanchette, "Algorithms for phylogenetic Footprinting," Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB01), Apr. 2001.
J. Davila, S. Balla and S. Rajasekaran, "Pampa: An improved branch and bound algorithm for planted (l, d) motif search," Technical report, 2007.