Segmentation using large language models: A new typology of American neighborhoods

Alex D. Singleton,Seth Spielman

EPJ Data Science(2024)

引用 0|浏览0
暂无评分
摘要
In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.
更多
查看译文
关键词
Geodemographics,Large Language Model (LLM),American Community Survey,Segmentation,Neighborhoods,Artificial Intelligence (AI),Demographics,Retreival Augmented Generation (RAG)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要