The Pairwise Sequence Alignment Problem

semanticscholar(2011)

引用 0|浏览2
暂无评分
摘要
For many human genes, their nucleotide or protein sequence is similar to that of a gene in another organism. Genes from two different organisms can have similar sequences for a variety of reasons, but often the genes share a common ancestor and are said to be related or homologous. Figure 1 illustrates two homologous sequences from part of a gene that codes for a leukemia transcription factor found in many vertebrates, including humans and zebra fish. Approximately 30% of human genes have homologs in the genome of a worm, 50% in the genome of a fly, 90% in the genome of a fish, and 99% in the genome of a chimpanzee. Over time, genes evolve and homologous genes are likely to diverge through nucleotide mutations, insertions, and deletions. However, too much change to a gene sequence may result in a loss of its function with negative effects on an organism's fitness, which will be selected against in the process of evolution. A fundamental problem in genomics is determining whether two sequences, such as two DNA sequences or two protein sequences, are related. Here, we restrict ourselves to primary sequences, though there are many interesting and important problems pertaining to structural and functional relationships. Given two DNA sequences, one whose functional role is known and one whose functional role is unknown, recognizing a relationship between the two sequences might suggest that the sequence with unknown function has a similar role as that of the sequence with known function. Indeed, identification and investigation of homologs of the pbx1 gene in humans, flies, and fish, shown in part in Figure 1, helped elucidate the role of this oncogene in tumor progression [1, 2]. More broadly, in modern genomics, the annotation of newly sequenced genomes is primarily established through comparative genomics approaches, i.e., through recognizing relationships between a newly sequenced genome and previously annotated genomic sequences [3]. Ideally, we would establish a firm relationship between the sequences, e.g., that the two sequences are derived from a common ancestor and, hence, homologous. Unfortunately, without being able to observe the evolution of the sequences from their common ancestor, it would be all but impossible to prove a homologous relationship between the two sequences. Though we cannot generally prove homology of sequences, we can often estimate the similarity of two sequences, and from this similarity we can hypothesize or infer homology of the sequences. Like many bioinformatics methods, then, computational approaches that assess the similarity of two genomic sequences are generally hypothesis-generating. When viewed through the lens of the scientific method, these approaches emphasize that part of the scientific method relating to generating hypotheses in contrast to more experimentally oriented approaches that emphasize that part of the scientific method relating to testing hypotheses.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要