Detecting Semantic Similarity of Diseases based Machine Learning.

2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(2023)

引用 0|浏览0
暂无评分
摘要
Identifying disease-disease associations (DDAs) is critical in medicine and systems biology, providing valuable insights into the complex relationships among different diseases. Recent advances in systems biology and the increasing availability of diverse biological data have provided opportunities for such research. However, the heterogeneity of the data makes it difficult to identify disease associations. To develop effective methods for predicting DDAs, it is necessary to leverage data at the molecular level. In this study, utilizing the Grouping-Scoring-Modeling approach, we have developed a statistical technique to compute a semantic similarity metric between the disease under study and other diseases. We have also used Jaccard similarity to compute similarity among disease groups. We have considered GediNETPro, using machine learning techniques, to record, over the Monte Carlo cross-validation, all the ranks based on the scores assigned by the S component to each disease group. The K-means clustering algorithm is applied to these recorded ranks for each disease group to detect patterns of similar diseases by placing them into coherent clusters. A cluster score (named as cluster_score) is assigned for each cluster by averaging the ranks of its members. Notably, the lowest cluster score is the more significant one. The semantic approach is applied to the top-ranked clusters to detect a semantic relationship between the disease under investigation and other diseases (groups). In addition, we have applied diversity metrics based on the Jaccard measurement to each disease cluster to support the semantic findings. A high value of diversity indicates a greater likelihood of detecting distinct groups with semantic relationships. This study enhances our understanding of disease associations and increases our ability to improve treatment strategies. This is especially important in a context where disease associations go beyond the scope of genetic factors.
更多
查看译文
关键词
disease-disease associations,machine learning,Grouping-Scoring-Modeling approach,semantic similarity,Jaccard similarity,GediNETPro,Monte Carlo cross-validation,K-means clustering algorithm,diversity metrics,Jaccard index
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要