A methodology for preprocessing structured big data in the behavioral sciences

BEHAVIOR RESEARCH METHODS(2022)

引用 0|浏览0
暂无评分
摘要
The characteristics of big data, including high volume, increased variety, and velocity, pose special challenges for data analysis. As these characteristics generally preclude manual data inspection and processing, researchers must often use computational methodologies to deal with this type of data; techniques that may be unfamiliar to nonspecialists, including behavioral scientists. However, previous data analytics methodologies within the field of computer science, developed to handle the generic tasks of data collection, preprocessing, and analysis, can be appropriated for use in other disciplines. These methodologies involve a sequential pipeline of quality checks to prepare data sets for analysis and application. Building upon these methodologies, this paper describes the Big Data Quality & Statistical Assurance (BDQSA) model, applicable for researchers in the behavioral sciences. It involves a series of data preprocessing tasks, to achieve data understanding, as well as data screening, cleaning, and transformation. These are followed by a statistical quality phase, which includes extraction of the relevant data subset, type conversions, ensuring sample representativeness when appropriate, and assessing statistical assumptions. The resulting model thereby provides methodological guidance for the preprocessing of behavioral science big data, aimed at ensuring acceptable data quality before analysis is undertaken. Sample R code snippets demonstrating the application of this model are provided throughout the paper.
更多
查看译文
关键词
Behavioral sciences, Behavioral science research, Big data, Data preprocessing, Personality big data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要