CatMapper: A user-friendly tool for integrating data across complex categories

Daniel Hruschka,Robert Bischoff,Matthew Peeples, I-Han Hsiao, Mohamed Sarwat

semanticscholar(2022)

引用 0|浏览0
暂无评分
摘要
We introduce CatMapper (catmapper.org), a set of user-friendly, web-based tools designed to help researchers overcome a common bottleneck in comparative research—integrating data across diverse datasets by complex categories (e.g., ethnicities, languages, religions, archaeological artifact types) that are often encoded very differently from dataset to dataset. We illustrate CatMapper's planned architecture and capabilities with the SocioMap tool (catmapper.org/sociomap) which focuses on four inter-related domains—ethnicities (>9000), religions (>1000), districts (> 200,000), and languages, language families and dialects (>25,000). Categories in these diverse domains share commonalities that make them challenging to work with, including large numbers of categories at multiple nested scales that can also change through time. To assist users in merging data by these categories, SocioMap will include four core functions: (1) explore contextual information about specific categories, (2) translate new sets of categories from existing datasets and published studies, (3) integrate novel combinations of datasets for researchers’ custom analysis needs, including automatically generated syntax (e.g., R, Stata) to merge datasets of interest, and (4) share merging templates for public re-use and open science. We outline current progress on the development of CatMapper/SocioMap, plans for future development, and potential expansion to other domains, such as artifact types in archaeology and material goods used in asset-based wealth indices.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要