Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

Zhaoqing Peng
Zhaoqing Peng
Yi Ma
Yi Ma
Guan Wang
Guan Wang
Junqi Jin
Junqi Jin
Shan Chen
Shan Chen
Rongquan Bai
Rongquan Bai
Mingzhou Xie
Mingzhou Xie
Miao Xu
Miao Xu
Chuan Yu
Chuan Yu

ICML, pp. 4060-4070, 2020.

Cited by: 0|Bibtex|Views147|Links
EI
Keywords:
bilevel optimizationmulti channelsequential advertisingaction spacechannel sequentialMore(29+)
Weibo:
Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of our Multi-channel Sequential Budget Constrained Bidding over the state-of-the-art baselines in terms of cumulative revenue

Abstract:

In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contri...More

Code:

Data:

0
Introduction
  • In E-commerce, online advertising plays an essential role for merchants to reach their target users, in which Real-time Bidding (RTB) (Zhang et al, 2014; 2016; Zhu et al, 2017) is an important mechanism.
  • Lowed to bid for every individual ad impression opportunity.
  • Each advertiser offers a bid based on the impression value and competes with other bidders in real-time.
  • The advertiser with the highest bid wins the auction and display ad and enjoys the impression value.
  • Displaying an ad associates with a cost: in Generalized Second-Price (GSP) Auction (Edelman et al, 2007), the winner is charged for fees according to the second highest bid.
  • The typical advertising objective for an advertiser is to maximize its cumulative revenue of winning impressions over a time period under a fixed budget constraint
Highlights
  • In E-commerce, online advertising plays an essential role for merchants to reach their target users, in which Real-time Bidding (RTB) (Zhang et al, 2014; 2016; Zhu et al, 2017) is an important mechanism
  • Displaying an ad associates with a cost: in Generalized Second-Price (GSP) Auction (Edelman et al, 2007), the winner is charged for fees according to the second highest bid
  • To address the above challenges, we propose a novel bilevel optimization framework: Multi-channel Sequential Budget Constrained Bidding (MSBCB), which transforms the original bilevel optimization problem into an equivalent two-level optimization with significantly reduced searching space
  • We present the overall Multi-channel Sequential Budget Constrained Bidding framework in Algorithm 1, which involves a two-level sequential optimization process
  • We reduce the magnitude of the continuous action space to a binary one by making full use of the prior knowledge in advertising, which greatly improves the sample utilization of the Reinforcement Learning approaches
  • We propose an action space reduction approach to significantly increase the learning efficiency of the lower-level
  • Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of our Multi-channel Sequential Budget Constrained Bidding over the state-of-the-art baselines in terms of cumulative revenue
Methods
  • Constrained + PPO 61890.92 11954.07 -17.63±16.11% Constrained + DDPG 74259.12 11996.12 -1.19±3.66% Constrained + DQN 70662.65 11881.12 -5.96±7.83% 93.70±2.36%.
  • Greedy + PPO Greedy + DDPG Greedy + DQN MSBCB MSBCB Offline Optimal.
  • The complete comparisons of all approaches are shown in Table 1.
  • Instead of utilizing the RL approach, MSBCB could find the one which maximizes VG(i|πi) − CPRthr ∗ VC (i|πi).
  • The authors see MSBCB is very close to the optimal solution and reaches an approximation ratio of 99.96%
Results
  • The authors conduct extensive analysis of the MSBCB in the following 5 aspects. All approaches aim to maximize the advertiser’s cumulative revenue under a fixed budget constraint.
  • Greedy with maximized CPR: This approach is similar to the method under the Greedy framework except that each πi is optimized by maximizing the long-term CPR.
  • The authors enumerate all policies for each user and select the one which could maximize its CPR.
  • This approach is named as Greedy+maxCPR
Conclusion
  • The authors formulate the multi-channel sequential advertising problem as a Dynamic Knapsack Problem, whose target is to maximize the long-term cumulative revenue over a period of time under a budget constraint.
  • The authors decompose the original problem into an easier bilevel optimization, which significantly reduces the solution space.
  • Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of the MSBCB over the state-of-the-art baselines in terms of cumulative revenue
Summary
  • Introduction:

    In E-commerce, online advertising plays an essential role for merchants to reach their target users, in which Real-time Bidding (RTB) (Zhang et al, 2014; 2016; Zhu et al, 2017) is an important mechanism.
  • Lowed to bid for every individual ad impression opportunity.
  • Each advertiser offers a bid based on the impression value and competes with other bidders in real-time.
  • The advertiser with the highest bid wins the auction and display ad and enjoys the impression value.
  • Displaying an ad associates with a cost: in Generalized Second-Price (GSP) Auction (Edelman et al, 2007), the winner is charged for fees according to the second highest bid.
  • The typical advertising objective for an advertiser is to maximize its cumulative revenue of winning impressions over a time period under a fixed budget constraint
  • Objectives:

    Given a threshold CPRthr as input, the authors aim to acquire the optimal advertising policy πi∗ defined in Equation (5) of.
  • Methods:

    Constrained + PPO 61890.92 11954.07 -17.63±16.11% Constrained + DDPG 74259.12 11996.12 -1.19±3.66% Constrained + DQN 70662.65 11881.12 -5.96±7.83% 93.70±2.36%.
  • Greedy + PPO Greedy + DDPG Greedy + DQN MSBCB MSBCB Offline Optimal.
  • The complete comparisons of all approaches are shown in Table 1.
  • Instead of utilizing the RL approach, MSBCB could find the one which maximizes VG(i|πi) − CPRthr ∗ VC (i|πi).
  • The authors see MSBCB is very close to the optimal solution and reaches an approximation ratio of 99.96%
  • Results:

    The authors conduct extensive analysis of the MSBCB in the following 5 aspects. All approaches aim to maximize the advertiser’s cumulative revenue under a fixed budget constraint.
  • Greedy with maximized CPR: This approach is similar to the method under the Greedy framework except that each πi is optimized by maximizing the long-term CPR.
  • The authors enumerate all policies for each user and select the one which could maximize its CPR.
  • This approach is named as Greedy+maxCPR
  • Conclusion:

    The authors formulate the multi-channel sequential advertising problem as a Dynamic Knapsack Problem, whose target is to maximize the long-term cumulative revenue over a period of time under a budget constraint.
  • The authors decompose the original problem into an easier bilevel optimization, which significantly reduces the solution space.
  • Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of the MSBCB over the state-of-the-art baselines in terms of cumulative revenue
Tables
  • Table1: Cumulative values, costs, value improvements (over Contextual Bandit) and the approximation ratio of all approaches
  • Table2: The training epochs and the number of samples needed by different approaches when achieving the same revenue level
  • Table3: The overall performance comparisons of the A/B testing
  • Table4: Detailed Comparison between an ad’s total budget and cost on a user sequence
  • Table5: Optimal types of each πi∗ of 10000 users
  • Table6: Table 6
  • Table7: The improvements in Revenue, CVR, PV and ROI of our MSBCB compared with the myopic Contextual Bandit method
Download tables as Excel
Funding
  • The work is supported by the National Natural Science Foundation of China (Grant Nos.: 61702362, U1836214), the Special Program of Artificial Intelligence and the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission (No.: 569 17ZXRGGX00150) and the Alibaba Group through Alibaba Innovative Research Program
Reference
  • Altman, E. Constrained Markov decision processes, volume 7. CRC Press, 1999.
    Google ScholarFindings
  • Astrom, K. J. and Hagglund, T. PID controllers: theory, design, and tuning, volume Instrument society of America Research Triangle Park, NC, 1995.
    Google ScholarFindings
  • Boutilier, C. and Lu, T. Budget allocation using weakly coupled, constrained markov decision processes. 2016.
    Google ScholarFindings
  • Cai, H., Ren, K., Zhang, W., Malialis, K., Wang, J., Yu, Y., and Guo, D. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 661–670. ACM, 2017.
    Google ScholarLocate open access versionFindings
  • Dantzig, G. B. Discrete-variable extremum problems. Operations research, 5(2):266–288, 1957.
    Google ScholarLocate open access versionFindings
  • Du, R., Zhong, Y., Nair, H., Cui, B., and Shou, R. Causally driven incremental multi touch attribution using a recurrent neural network. arXiv preprint arXiv:1902.00215, 2019.
    Findings
  • Edelman, B., Ostrovsky, M., and Schwarz, M. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):242–259, 2007.
    Google ScholarLocate open access versionFindings
  • Ie, E., Jain, V., Wang, J., Navrekar, S., Agarwal, R., Wu, R., Cheng, H.-T., Lustman, M., Gatto, V., Covington, P., et al. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767, 2019.
    Findings
  • Jin, J., Song, C., Li, H., Gai, K., Wang, J., and Zhang, W. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2193–2201. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • Li, P., Hawbani, A., et al. An efficient budget allocation algorithm for multi-channel advertising. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 886–891. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
    Findings
  • Martello, S., Pisinger, D., and Toth, P. Dynamic programming and strong bounds for the 0-1 knapsack problem. Management Science, 45(3):414–424, 1999.
    Google ScholarLocate open access versionFindings
  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
    Findings
  • Nuara, A., Sosio, N., TrovA, F., Zaccardi, M. C., Gatti, N., and Restelli, M. Dealing with interdependencies and uncertainty in multi-channel advertising campaigns optimization. In The World Wide Web Conference, pp. 1376–1386. ACM, 2019.
    Google ScholarLocate open access versionFindings
  • Ren, K., Zhang, W., Chang, K., Rong, Y., Yu, Y., and Wang, J. Bidding machine: Learning to bid for directly optimizing profits in display advertising. IEEE Transactions on Knowledge and Data Engineering, 30(4):645–659, 2017.
    Google ScholarLocate open access versionFindings
  • Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., and Wang, J. Learning multi-touch conversion attribution with dual-attention mechanisms for online advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1433–1442. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • Ren, K., Qin, J., Zheng, L., Yang, Z., Zhang, W., and Yu, Y. Deep landscape forecasting for real-time bidding advertising. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 363–372. ACM, 2019.
    Google ScholarLocate open access versionFindings
  • Roberge, M. The Sales Acceleration Formula: Using Data, Technology, and Inbound Selling to go from 0to100 Million. John Wiley & Sons, 2015.
    Google ScholarFindings
  • Ji, W. and Wang, X. Additional multi-touch attribution for online advertising. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
Your rating :
0

 

Tags
Comments