WeChat Mini Program
Old Version Features

The Site Linkage Spectrum of Data Arrays

arXiv (Cornell University)(2024)

University of Virginia Department of Computer Science

Cited 0|Views24
Abstract
A new perspective is introduced regarding the analysis of Multiple SequenceAlignments (MSA), representing aligned data defined over a finite alphabet ofsymbols. The framework is designed to produce a block decomposition of an MSA,where each block is comprised of sequences exhibiting a certain site-coherence.The key component of this framework is an information theoretical potentialdefined on pairs of sites (links) within the MSA. This potential quantifies theexpected drop in variation of information between the two constituent sites,where the expectation is taken with respect to all possible sub-alignments,obtained by removing a finite, fixed collection of rows. It is proved that thepotential is zero for linked sites representing columns, whose symbols are inbijective correspondence and it is strictly positive, otherwise. It isfurthermore shown that the potential assumes its unique minimum for links atwhich each symbol pair appears with the same multiplicity. Finally, anapplication is presented regarding anomaly detection in an MSA, composed ofinverse fold solutions of a fixed tRNA secondary structure, where the anomaliesare represented by inverse fold solutions of a different RNA structure.
More
Translated text
Key words
Secondary Structure Prediction
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined