MINDFUL: A method to identify novel and diverse signals with fast unsupervised learning

F1000Research(2019)

引用 0|浏览5
暂无评分
摘要
With rapid advances in experimental methods that map transcription start sites (TSSs) at a high resolution, there is a need to characterize the sequence diversity of TSS neighborhoods. Most current techniques scan for previously discovered elements, such as the TATA box, the INR motif, CpG islands, etc. to categorize promoters into different classes. Reliance on such elements hinders the discovery of novel elements. On the other hand, methods that use standard motif discovery to discover de novo promoter elements are also limited by the fact that a motif is picked up only if it is over-represented in the dataset. An element that appears only in a small set of promoters can thus be missed. We previously developed a clustering-based approach that uses no prior knowledge of elements to solve this problem []. That method uses Gibbs sampling to learn the model parameters, but is untenable on large datasets. Here we propose a new, fast method called MINDFUL, that uses a greedy -means-like approach to cluster promoters aligned by TSSs into diverse classes, while also learning the optimal value of . It is general enough to be used for any data that has categorical variables, and is not restricted to DNA.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要