Estimation of discrete choice models considering simultaneously multiple objectives and complex data characteristics

Prithvi Bhat Beeramoole, Ryan Kelly,Md Mazharul Haque, Alban Pinz,Alexander Paz

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES(2024)

引用 0|浏览1
暂无评分
摘要
This paper focuses on the discrete choice estimation problem, which involves multiple objectives and testing a broad range of hypotheses that can affect both interpretability and prediction accuracy. Previous studies have proposed mathematical programming formulations to assist with hypothesis testing and estimation. However, there is limited knowledge regarding the effect of inand out-of-sample model performance criteria during the search for parsimonious specifications. To address this knowledge gap, a multi-objective optimization framework is proposed, including both in-sample goodness-of-fit and out-of-sample predictive accuracy, to generate multiple unique specifications and perform extensive hypothesis testing considering simultaneously potential explanatory variables, their functional forms, nonlinearities, heterogeneous effects, and correlations. A metaheuristic was designed and implemented to solve the proposed multiobjective nonlinear mixed-integer mathematical programming problem. Experiments, including various datasets and discrete choices, were used to illustrate the efficacy of the proposed framework. The goal was to find specifications that are either similar or dominate those reported in literature, considering both interpretability and prediction accuracy. Important insights regarding potential explanatory factors and heterogeneous preferences, which were not reported in literature, were captured using the proposed framework. In addition, for one of the datasets used in this study, the proposed framework enabled the discovery of three distinct clusters considering specification type and model performance in terms of interpretability and prediction accuracy. For the given dataset, these clusters suggest that the proposed approach allowed extensive exploration of the data across different specification types. In addition, the Mixed-Logit models with correlated parameters were found to perform significantly better in terms of insample fit than those without correlation. Similarly, multinomial-Logit models showed the worst performance for the given dataset. In contrast, multinomial-Logit models provided superior out-of-sample fit relative to advanced specifications, which illustrates trade-offs between model in- and out-of-sample fitness. A comparative analysis, including multiple performance measures, was also conducted. The results suggest that model evaluation using in-sample Bayesian Information Criterion (BIC) and out-of-sample Mean Absolute Error (MAE), and in-sample BIC and outof-sample Mean Squared Error (MSE) enables estimation of specifications with better in- and outof-sample performance compared to those estimated using maximum log-likelihood and minimum number of model parameters. In addition, a mostly linear relationship was observed between in-sample and out-of-sample log-likelihood, indicating that the latter does not provide much additional information regarding prediction compared to the in-sample estimates. These results showed the value of using an optimization framework to support modelling decisions by enabling extensive hypothesis testing and including multiple performance criteria as well as complex data characteristics to discover important and reliable insights.
更多
查看译文
关键词
Discrete choice models,Discrete choice,Multi -objective,Optimization,Metaheuristic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要