基本信息
浏览量:227
职业迁徙
个人简介
We are interested in data science, which includes topics such as data mining, machine learning, artificial intelligence, big data, and databases.
Data mining is the process of discovering knowledge or patterns from massive amounts of data. Machine learning constructs algorithms that automatically improve with data. Specifically, we are interested in analyzing data that are structured, such as graphs. Our current three major projects are as follows.
Our first major project is the clustering of evolving graphs. A human network is represented by a graph in which each vertex and edge, respectively, correspond to a human and relationship between two humans in a human network. A human network can vary over time, so the network is represented by an evolving graph. Most conventional methods cannot detect the division and mergence of clusters because they assume the number of clusters is constant over time. We have developed an algorithm for clustering evolving graphs that detects not only human communities as clusters but also the division and mergence of these human communities. Our method is based on spectral clustering, which is a hard clustering method, so our method is fast, accurate, and robust to outliers.
Another major project is graph classification. Chemical compounds can be represented by graphs, where each vertex and edge correspond to an atom and chemical bond, respectively. It is beneficial to predict the properties of newly developed chemical compounds by learning existing chemical compounds and their properties. However, most conventional methods in machine learning assume that each example, such as a chemical compound, is represented as a vector, but there is no way to convert chemical compounds to vectors without information loss. One method for analyzing chemical compounds without converting them to vectors uses machine learning algorithms with kernel methods. We have developed a kernel called the shorted Hadamard code kernel (SHCK), which is based on the Hadamard code. SHCK has the characteristics of both fast computation and precise expression for measuring the similarity between two graphs.
The third major project concerns labeled graph enumeration. As mentioned above, chemical compounds can be represented by graphs. In addition, atom and bond types in the chemical compound correspond to the labels of vertices and edges in the graphs. Although it is said that there are more than 1060 possible chemical compounds, only 108 chemical compounds are available to the general population. Our challenge in this project is to quickly enumerate the labeled graphs that can be found in nature. We have recently developed software that can enumerate about 75,000 labeled graphs per second without parallel computation.
研究兴趣
论文共 61 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
IJCNNpp.1-8, (2023)
引用0浏览0EIWOS引用
0
0
MOLECULAR SYSTEMS DESIGN & ENGINEERINGno. 4 (2023): 431-435
semanticscholar(2021)
semanticscholar(2020)
加载更多
作者统计
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn