AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 2636-2645, 2020.

Cited by: 4|Bibtex|Views183|Links
EI
Keywords:
Attention Factorization MachineProduct-based Neural Networkdeep learningcollaborative filteringfeature interactionMore(30+)
Weibo:
The proposed methods have been deployed onto the training platform of Huawei App Store recommendation service, with significant economic profit demonstrated

Abstract:

Learning feature interactions is crucial for click-through rate (CTR) prediction in recommender systems. In most existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may...More

Code:

Data:

0
Introduction
  • Click-through rate (CTR) prediction is crucial in recommender systems, where the task is to predict the probability of the user clicking on the recommended items [5, 27].
  • Many recommendation decisions can be made based on the predicted CTR.
  • The core of these recommender systems is to extract significant low-order and high-order feature interactions.
  • Tree models can only explore a small fraction of all possible feature interactions in recommender systems with multi-field categorical data [21], so that their exploration ability is restricted
Highlights
  • Click-through rate (CTR) prediction is crucial in recommender systems, where the task is to predict the probability of the user clicking on the recommended items [5, 27]
  • Inspired by the recent work DARTS [17] for neural architecture search, we propose a two-stage method Automatic Feature Interaction Selection for automatic selecting low-order and high-order feature interactions in factorization models
  • We proposed Automatic Feature Interaction Selection to automatically select important 2nd - and 3rd -order feature interactions
  • The proposed methods are generally applicable to all the factorization models and the selected important interactions can be transferred to other deep learning models for Click-through rate prediction
  • The proposed Automatic Feature Interaction Selection is easy to implement with marginal search costs, and the performance improvement is significant in two benchmark datasets and one private dataset
  • The proposed methods have been deployed onto the training platform of Huawei App Store recommendation service, with significant economic profit demonstrated
Methods
  • The authors describe the proposed AutoFIS, an algorithm to select important feature interactions in factorization models automatically.

    3.1 Factorization Model (Base Model)

    First, the authors define factorization models: Definition 3.1.
  • Factorization models are the models where the interaction of several embeddings from different features is modeled into a real number by some operation such as inner product or neural network.
  • FM consists of a feature embedding layer and a feature interaction layer.
  • Besides these two layers, DeepFM and IPNN model include an extra layer: MLP layer.
  • The difference between DeepFM and IPNN is that feature interaction layer and MLP layer work in parallel in DeepFM, while ordered in sequence in IPNN
Results
  • The performance is significantly improved by using both 2nd - and 3rd -order feature interactions (namely AutoIPNN(3rd)) selected by AutoFM.
Conclusion
  • The authors proposed AutoFIS to automatically select important 2nd - and 3rd -order feature interactions.
  • The proposed methods are generally applicable to all the factorization models and the selected important interactions can be transferred to other deep learning models for CTR prediction.
  • The proposed AutoFIS is easy to implement with marginal search costs, and the performance improvement is significant in two benchmark datasets and one private dataset.
  • The proposed methods have been deployed onto the training platform of Huawei App Store recommendation service, with significant economic profit demonstrated
Summary
  • Introduction:

    Click-through rate (CTR) prediction is crucial in recommender systems, where the task is to predict the probability of the user clicking on the recommended items [5, 27].
  • Many recommendation decisions can be made based on the predicted CTR.
  • The core of these recommender systems is to extract significant low-order and high-order feature interactions.
  • Tree models can only explore a small fraction of all possible feature interactions in recommender systems with multi-field categorical data [21], so that their exploration ability is restricted
  • Methods:

    The authors describe the proposed AutoFIS, an algorithm to select important feature interactions in factorization models automatically.

    3.1 Factorization Model (Base Model)

    First, the authors define factorization models: Definition 3.1.
  • Factorization models are the models where the interaction of several embeddings from different features is modeled into a real number by some operation such as inner product or neural network.
  • FM consists of a feature embedding layer and a feature interaction layer.
  • Besides these two layers, DeepFM and IPNN model include an extra layer: MLP layer.
  • The difference between DeepFM and IPNN is that feature interaction layer and MLP layer work in parallel in DeepFM, while ordered in sequence in IPNN
  • Results:

    The performance is significantly improved by using both 2nd - and 3rd -order feature interactions (namely AutoIPNN(3rd)) selected by AutoFM.
  • Conclusion:

    The authors proposed AutoFIS to automatically select important 2nd - and 3rd -order feature interactions.
  • The proposed methods are generally applicable to all the factorization models and the selected important interactions can be transferred to other deep learning models for CTR prediction.
  • The proposed AutoFIS is easy to implement with marginal search costs, and the performance improvement is significant in two benchmark datasets and one private dataset.
  • The proposed methods have been deployed onto the training platform of Huawei App Store recommendation service, with significant economic profit demonstrated
Tables
  • Table1: Benchmark performance: "time" is the inference time for 2 million samples. "top" represents the percentage of feature interactions kept for 2nd / 3r d order interaction. "cost" contain the GPU time of the search and re-train stage. "Rel. Impr." is the relative AUC improvement over FM model. Note: FFM has a lower time and cost due to its smaller embedding size limited by GPU memory constraint
  • Table2: Dataset Statistics
  • Table3: Performance in Private Dataset. "Rel. Impr." is the relative AUC improvement over FM model
  • Table4: Performance of transferring interactions selected by AutoFM to IPNN. AutoIPNN(2nd) indicates IPNN with 2nd -order interactions selected by AutoFM(2nd) and AutoIPNN(3rd) indicates IPNN with 2nd - and 3r d -order interactions selected by AutoFM(3rd)
  • Table5: Performance comparison between the model with interactions selected by our model and by statistics_AUC on Avazu Dataset
  • Table6: Different Variants
  • Table7: Performance comparison of different feature interaction selection strategies. ∗: with fewer interactions, FM may have better performance
  • Table8: Comparison of one-level and bi-level optimization
  • Table9: Parameter Settings
Download tables as Excel
Related work
  • CTR prediction is generally formulated as a binary classification problem [16]. In this section we briefly review factorization models for CTR prediction and AutoML models for recommender systems.

    Factorization machine (FM) [23] projects each feature into a low-dimensional vector and models feature interactions by inner product, which works well for sparse data. Field-aware factorization machine (FFM) [12] further enables each feature to have multiple vector representations to interact with features from other fields.

    Recently, deep learning models have achieved state-of-the-art performance on some public benchmarks [16, 25]. Several models use MLP to improve FM, such as Attention FM [26], Neural FM [8]. Wide & Deep [5] jointly trains a wide model for artificial features and a deep model for raw features. DeepFM [7] uses an FM layer to replace the wide component in Wide & Deep. PNN [21] uses MLP to model the interaction of FM layer and feature embeddings while PIN [21] introduces a network-in-network architecture to model pairwise feature interactions with sub-networks rather than simple inner product operations in PNN and DeepFM. Note that all existing factorization models simply enumerate all 2nd -order feature interactions which contain many useless and noisy interactions.
Reference
  • Gabriel Bender. 2019. Understanding and simplifying one-shot architecture search. In CVPR.
    Google ScholarFindings
  • Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. In WSDM. 46–54.
    Google ScholarLocate open access versionFindings
  • Shih-Kang Chao and Guang Cheng. 2019. A generalization of regularized dual averaging and its dynamics. In CoRR. abs/1909.10072 (2019).
    Google ScholarLocate open access versionFindings
  • Tianqi Chen and Carlos Guestrin. 2016. XGBoost:A Scalable Tree Boosting System. In SIGKDD. 785–794.
    Google ScholarLocate open access versionFindings
  • Hengtze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Deepak Chandra, Hrishi Aradhye, Glen Anderson, Greg S Corrado, Wei Chai, Mustafa Ispir, et al. 2016.
    Google ScholarLocate open access versionFindings
  • Paul Covington, Jay Adams, and Emre Sargin. 201Deep Neural Networks for YouTube Recommendations. In RecSys. 191–198.
    Google ScholarFindings
  • Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 201DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In IJCAI. 1725–1731.
    Google ScholarFindings
  • Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In SIGIR. 355–364.
    Google ScholarLocate open access versionFindings
  • Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, and Stuart Bowers. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In ADKDD@KDD. 5:1–5:9.
    Google ScholarLocate open access versionFindings
  • Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. In Neural Networks.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML. 448–456.
    Google ScholarLocate open access versionFindings
  • Yu-Chin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Fieldaware Factorization Machines for CTR Prediction. In RecSys.
    Google ScholarFindings
  • YuChin Juan, Yong Zhuang, and Wei-Sheng Chin. 2014. 3 IdiotsâĂŹ Approach for Display Advertising Challenge. https://www.csie.ntu.edu.tw/r01922136/kaggle2014-criteo.pdf.
    Findings
  • Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. IEEE Computer 42, 8 (2009), 30–37.
    Google ScholarLocate open access versionFindings
  • Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In KDD.
    Google ScholarFindings
  • Bin Liu, Ruiming Tang, Yingzhi Chen, Jinkai Yu, Huifeng Guo, and Yuzhou Zhang. 2019. Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction. In WWW. ACM, 1119–1129.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In ICLR.
    Google ScholarLocate open access versionFindings
  • Yuanfei Luo, Mengshuo Wang, Hao Zhou, Quanming Yao, Wei-Wei Tu, Yuqiang Chen, Wenyuan Dai, and Qiang Yang. 2019. AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications. In KDD. 1936–1945.
    Google ScholarFindings
  • H. Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: a view from the trenches. In KDD.
    Google ScholarFindings
  • Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, and Quan Lu. 2018. Field-weighted factorization machines for click-through rate prediction in display advertising. In WWW. 1349–1357.
    Google ScholarFindings
  • Yanru Qu, Bohui Fang, Weinan Zhang, Ruiming Tang, Minzhe Niu, Huifeng Guo, Yong Yu, and Xiuqiang He. 2019. Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data. ACM Trans. Inf. Syst. 37, 1 (2019), 5:1–5:35.
    Google ScholarLocate open access versionFindings
  • James Kwok Yong Li Cho-Jui Hsieh Quanming Yao, Xiangning Chen. 2020. Efficient Neural Interaction Function Search for Collaborative Filtering. In WWW.
    Google ScholarFindings
  • Steffen Rendle. 2010. Factorization Machines. In ICDM. 995–1000.
    Google ScholarLocate open access versionFindings
  • Shai Shalev-Shwartz, Ohad Shamir, and Shaked Shammah. 2017. Failures of
    Google ScholarFindings
  • Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017.
    Google ScholarFindings
  • Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua.
    Google ScholarFindings
  • 2017. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. In IJCAI. 3119–3125.
    Google ScholarFindings
  • [27] Guorui Zhou, Xiaoqiang Zhu, Chengru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In KDD. 1059–1068.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments