Handling an inconsistently coded categorical variable in a longitudinal dataset with cat2cat

Maciej Nasiński,Krzysztof Gajowniczek

SoftwareX(2023)

引用 0|浏览0
暂无评分
摘要
Categorical variable levels change over time with the addition, deletion, or regrouping of categories. This study introduces cat2cat procedure, to handle an inconsistently coded categorical variable in a longitudinal dataset. Such categorical variables often represent classifications, for instance The International Standard Classification of Occupations or the International Classification of Diseases. The cat2cat procedure enables unification of an inconsistently coded categorical variable between two time points in accordance with a mapping table. Categorical variable levels from a specific period are applied to a neighboring period by replicating an observation if it can be assigned to more than one category. Then, frequencies or statistical methods are used to approximate the probabilities of being assigned to each category. The cat2cat procedure extends the scope of the available statistical analyses in a longitudinal dataset with inconsistently coded categorical variables, which are ordinarily removed or force dataset aggregation. The procedure is offered to the scientific community in the cat2cat R and Python packages.
更多
查看译文
关键词
Categorical,Longitudinal,Panel,Unify
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要