Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

BIOINFORMATICS(2020)

引用 12|浏览21
暂无评分
摘要
Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要