BCR CDR3 length distributions differ between blood and spleen and between old and young patients, and TCR distributions can be used to detect myelodysplastic syndrome.

PHYSICAL BIOLOGY(2013)

引用 14|浏览4
暂无评分
摘要
Complementarity-determining region 3 (CDR3) is the most hyper-variable region in B cell receptor (BCR) and T cell receptor (TCR) genes, and the most critical structure in antigen recognition and thereby in determining the fates of developing and responding lymphocytes. There are millions of different TCR V beta chain or BCR heavy chain CDR3 sequences in human blood. Even now, when high-throughput sequencing becomes widely used, CDR3 length distributions (also called spectratypes) are still a much quicker and cheaper method of assessing repertoire diversity. However, distribution complexity and the large amount of information per sample (e. g. 32 distributions of the TCR alpha chain, and 24 of TCR beta) calls for the use of machine learning tools for full exploration. We have examined the ability of supervised machine learning, which uses computational models to find hidden patterns in predefined biological groups, to analyze CDR3 length distributions from various sources, and distinguish between experimental groups. We found that (a) splenic BCR CDR3 length distributions are characterized by low standard deviations and few local maxima, compared to peripheral blood distributions; (b) healthy elderly people's BCR CDR3 length distributions can be distinguished from those of the young; and (c) a machine learning model based on TCR CDR3 distribution features can detect myelodysplastic syndrome with approximately 93% accuracy. Overall, we demonstrate that using supervised machine learning methods can contribute to our understanding of lymphocyte repertoire diversity.
更多
查看译文
关键词
myelodysplastic syndrome,tcr distributions,spleen
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要