Polypeptide sequence property relationships in Escherichia coli based on auto cross covariances

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS(1995)

引用 41|浏览12
暂无评分
摘要
For multivariate classification and quantitative structure activity studies of proteins, which involve amino acid sequences of different length, preprocessing methods are needed which make it possible to translate the sequence into a quantitative measure with the same number of variables. Here three different preprocessing methods are investigated. Two of the methods are variants of auto cross covariances calculated from a multipositional description of the protein sequence. For the multipositional description three orthogonal scales were used which physico-chemically describes the amino acids. The third method is a quantification of each sequence by a diamino acid frequency histogram. The methods are investigated by a classification of 106 Escherichia coli and Gramnegative bacteria proteins. The proteins were divided into four classes depending on their location in the cell. The four classes were: cytoplasm, inner membrane, periplasm and outer membrane. For the proceeding classification PLS discriminant analysis was used. The results showed that one of the variants of auto cross covariances and the diamino acid frequency histogram representation contained much information related to the given classification problem. Hence the amino acid sequences for proteins with different final locations in Escherichia coli have significant features related to protein structure and location.
更多
查看译文
关键词
PEPTIDE SEQUENCES,PARTIAL LEAST SQUARES DISCRIMINANT ANALYSIS,PROTEIN CLASSIFICATION,SEQUENCE ANALYSIS,AUTO CROSS COVARIANCES,MULTIVARIATE DATA ANALYSIS
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要