Reference Sequence Selection For Motif Searches

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(2015)

引用 5|浏览22
暂无评分
摘要
The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets.In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern -driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100x on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.
更多
查看译文
关键词
Planted (l, d),motif search,pattern-driven,reference sequences
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要