# Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

ICML, pp. 4060-4070, 2020.

EI

Keywords:

Weibo:

Abstract:

In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contri...More

Code:

Data:

Introduction

- In E-commerce, online advertising plays an essential role for merchants to reach their target users, in which Real-time Bidding (RTB) (Zhang et al, 2014; 2016; Zhu et al, 2017) is an important mechanism.
- Lowed to bid for every individual ad impression opportunity.
- Each advertiser offers a bid based on the impression value and competes with other bidders in real-time.
- The advertiser with the highest bid wins the auction and display ad and enjoys the impression value.
- Displaying an ad associates with a cost: in Generalized Second-Price (GSP) Auction (Edelman et al, 2007), the winner is charged for fees according to the second highest bid.
- The typical advertising objective for an advertiser is to maximize its cumulative revenue of winning impressions over a time period under a fixed budget constraint

Highlights

- In E-commerce, online advertising plays an essential role for merchants to reach their target users, in which Real-time Bidding (RTB) (Zhang et al, 2014; 2016; Zhu et al, 2017) is an important mechanism
- Displaying an ad associates with a cost: in Generalized Second-Price (GSP) Auction (Edelman et al, 2007), the winner is charged for fees according to the second highest bid
- To address the above challenges, we propose a novel bilevel optimization framework: Multi-channel Sequential Budget Constrained Bidding (MSBCB), which transforms the original bilevel optimization problem into an equivalent two-level optimization with significantly reduced searching space
- We present the overall Multi-channel Sequential Budget Constrained Bidding framework in Algorithm 1, which involves a two-level sequential optimization process
- We reduce the magnitude of the continuous action space to a binary one by making full use of the prior knowledge in advertising, which greatly improves the sample utilization of the Reinforcement Learning approaches
- We propose an action space reduction approach to significantly increase the learning efficiency of the lower-level
- Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of our Multi-channel Sequential Budget Constrained Bidding over the state-of-the-art baselines in terms of cumulative revenue

Methods

- Constrained + PPO 61890.92 11954.07 -17.63±16.11% Constrained + DDPG 74259.12 11996.12 -1.19±3.66% Constrained + DQN 70662.65 11881.12 -5.96±7.83% 93.70±2.36%.
- Greedy + PPO Greedy + DDPG Greedy + DQN MSBCB MSBCB Offline Optimal.
- The complete comparisons of all approaches are shown in Table 1.
- Instead of utilizing the RL approach, MSBCB could find the one which maximizes VG(i|πi) − CPRthr ∗ VC (i|πi).
- The authors see MSBCB is very close to the optimal solution and reaches an approximation ratio of 99.96%

Results

- The authors conduct extensive analysis of the MSBCB in the following 5 aspects. All approaches aim to maximize the advertiser’s cumulative revenue under a fixed budget constraint.
- Greedy with maximized CPR: This approach is similar to the method under the Greedy framework except that each πi is optimized by maximizing the long-term CPR.
- The authors enumerate all policies for each user and select the one which could maximize its CPR.
- This approach is named as Greedy+maxCPR

Conclusion

- The authors formulate the multi-channel sequential advertising problem as a Dynamic Knapsack Problem, whose target is to maximize the long-term cumulative revenue over a period of time under a budget constraint.
- The authors decompose the original problem into an easier bilevel optimization, which significantly reduces the solution space.
- Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of the MSBCB over the state-of-the-art baselines in terms of cumulative revenue

Summary

## Introduction:

In E-commerce, online advertising plays an essential role for merchants to reach their target users, in which Real-time Bidding (RTB) (Zhang et al, 2014; 2016; Zhu et al, 2017) is an important mechanism.- Lowed to bid for every individual ad impression opportunity.
- Each advertiser offers a bid based on the impression value and competes with other bidders in real-time.
- The advertiser with the highest bid wins the auction and display ad and enjoys the impression value.
- Displaying an ad associates with a cost: in Generalized Second-Price (GSP) Auction (Edelman et al, 2007), the winner is charged for fees according to the second highest bid.
- The typical advertising objective for an advertiser is to maximize its cumulative revenue of winning impressions over a time period under a fixed budget constraint
## Objectives:

Given a threshold CPRthr as input, the authors aim to acquire the optimal advertising policy πi∗ defined in Equation (5) of.## Methods:

Constrained + PPO 61890.92 11954.07 -17.63±16.11% Constrained + DDPG 74259.12 11996.12 -1.19±3.66% Constrained + DQN 70662.65 11881.12 -5.96±7.83% 93.70±2.36%.- Greedy + PPO Greedy + DDPG Greedy + DQN MSBCB MSBCB Offline Optimal.
- The complete comparisons of all approaches are shown in Table 1.
- Instead of utilizing the RL approach, MSBCB could find the one which maximizes VG(i|πi) − CPRthr ∗ VC (i|πi).
- The authors see MSBCB is very close to the optimal solution and reaches an approximation ratio of 99.96%
## Results:

The authors conduct extensive analysis of the MSBCB in the following 5 aspects. All approaches aim to maximize the advertiser’s cumulative revenue under a fixed budget constraint.- Greedy with maximized CPR: This approach is similar to the method under the Greedy framework except that each πi is optimized by maximizing the long-term CPR.
- The authors enumerate all policies for each user and select the one which could maximize its CPR.
- This approach is named as Greedy+maxCPR
## Conclusion:

The authors formulate the multi-channel sequential advertising problem as a Dynamic Knapsack Problem, whose target is to maximize the long-term cumulative revenue over a period of time under a budget constraint.- The authors decompose the original problem into an easier bilevel optimization, which significantly reduces the solution space.
- Extensive offline experimental analysis and online A/B testing demonstrate the superior performance of the MSBCB over the state-of-the-art baselines in terms of cumulative revenue

- Table1: Cumulative values, costs, value improvements (over Contextual Bandit) and the approximation ratio of all approaches
- Table2: The training epochs and the number of samples needed by different approaches when achieving the same revenue level
- Table3: The overall performance comparisons of the A/B testing
- Table4: Detailed Comparison between an ad’s total budget and cost on a user sequence
- Table5: Optimal types of each πi∗ of 10000 users
- Table6: Table 6
- Table7: The improvements in Revenue, CVR, PV and ROI of our MSBCB compared with the myopic Contextual Bandit method

Funding

- The work is supported by the National Natural Science Foundation of China (Grant Nos.: 61702362, U1836214), the Special Program of Artificial Intelligence and the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission (No.: 569 17ZXRGGX00150) and the Alibaba Group through Alibaba Innovative Research Program

Reference

- Altman, E. Constrained Markov decision processes, volume 7. CRC Press, 1999.
- Astrom, K. J. and Hagglund, T. PID controllers: theory, design, and tuning, volume Instrument society of America Research Triangle Park, NC, 1995.
- Boutilier, C. and Lu, T. Budget allocation using weakly coupled, constrained markov decision processes. 2016.
- Cai, H., Ren, K., Zhang, W., Malialis, K., Wang, J., Yu, Y., and Guo, D. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 661–670. ACM, 2017.
- Dantzig, G. B. Discrete-variable extremum problems. Operations research, 5(2):266–288, 1957.
- Du, R., Zhong, Y., Nair, H., Cui, B., and Shou, R. Causally driven incremental multi touch attribution using a recurrent neural network. arXiv preprint arXiv:1902.00215, 2019.
- Edelman, B., Ostrovsky, M., and Schwarz, M. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):242–259, 2007.
- Ie, E., Jain, V., Wang, J., Navrekar, S., Agarwal, R., Wu, R., Cheng, H.-T., Lustman, M., Gatto, V., Covington, P., et al. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767, 2019.
- Jin, J., Song, C., Li, H., Gai, K., Wang, J., and Zhang, W. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2193–2201. ACM, 2018.
- Li, P., Hawbani, A., et al. An efficient budget allocation algorithm for multi-channel advertising. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 886–891. IEEE, 2018.
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Martello, S., Pisinger, D., and Toth, P. Dynamic programming and strong bounds for the 0-1 knapsack problem. Management Science, 45(3):414–424, 1999.
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Nuara, A., Sosio, N., TrovA, F., Zaccardi, M. C., Gatti, N., and Restelli, M. Dealing with interdependencies and uncertainty in multi-channel advertising campaigns optimization. In The World Wide Web Conference, pp. 1376–1386. ACM, 2019.
- Ren, K., Zhang, W., Chang, K., Rong, Y., Yu, Y., and Wang, J. Bidding machine: Learning to bid for directly optimizing profits in display advertising. IEEE Transactions on Knowledge and Data Engineering, 30(4):645–659, 2017.
- Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., and Wang, J. Learning multi-touch conversion attribution with dual-attention mechanisms for online advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1433–1442. ACM, 2018.
- Ren, K., Qin, J., Zheng, L., Yang, Z., Zhang, W., and Yu, Y. Deep landscape forecasting for real-time bidding advertising. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 363–372. ACM, 2019.
- Roberge, M. The Sales Acceleration Formula: Using Data, Technology, and Inbound Selling to go from 0to100 Million. John Wiley & Sons, 2015.
- Ji, W. and Wang, X. Additional multi-touch attribution for online advertising. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

Tags

Comments