# Learning Deep Features in Instrumental Variable Regression

ICLR 2021, 2021.

Keywords:

Weibo:

Abstract:

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables by utilizing an instrumental variable, which is conditionally independent of the outcome given the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs line...More

Code:

Data:

Introduction

- The aim of supervised learning is to obtain a model based on samples observed from some data generating process, and to make predictions about new samples generated from the same distribution.
- If the goal is to predict the effect of the actions on the world, the aim becomes to assess the influence of interventions on this data generating process.
- To answer such causal questions, a supervised learning approach is inappropriate, since the interventions, called treatments, may affect the underlying distribution of the variable of interest, which is called the outcome.
- The time of the year is a confounder, since it affects both the sales and the prices, and the authors need to correct the bias caused by it

Highlights

- The aim of supervised learning is to obtain a model based on samples observed from some data generating process, and to make predictions about new samples generated from the same distribution
- We propose Deep Feature Instrumental Variable Regression (DFIV), which aims to combine the advantages of all previous approaches, while avoiding their limitations
- We empirically show that deep feature instrumental variable regression (DFIV) performs better than other methods on several Instrumental variable (IV) benchmarks, and apply DFIV successfully to off-policy policy evaluation, which is a fundamental problem in Reinforcement Learning (RL)
- We have proposed a novel method for instrumental variable regression, Deep Feature IV (DFIV), which performs two-stage least squares regression on flexible and expressive features of the instrument and treatment
- As a contribution to the IV literature, we showed how to adaptively learn these feature maps with deep neural networks
- We showed that the off-policy policy evaluation (OPE) problem in deep RL can be interpreted as a nonlinear IV regression, and that DFIV performs competitively in this domain

Methods

- The authors report the empirical performance of the DFIV method.
- The evaluation considers both low and high-dimensional treatment variables.
- In the deep RL context, the authors apply DFIV to perform off-policy policy evaluation (OPE).
- The algorithms in the first two experiments are implemented using PyTorch (Paszke et al, 2019) and the OPE experiments are implemented using TensorFlow (Abadi et al, 2015) and the Acme RL framework (Hoffman et al, 2020).

Conclusion

- The authors have proposed a novel method for instrumental variable regression, Deep Feature IV (DFIV), which performs two-stage least squares regression on flexible and expressive features of the instrument and treatment.
- As a contribution to the IV literature, the authors showed how to adaptively learn these feature maps with deep neural networks.
- The authors showed that the off-policy policy evaluation (OPE) problem in deep RL can be interpreted as a nonlinear IV regression, and that DFIV performs competitively in this domain.
- In RL, problems with additional confounders are common, see e.g. (Namkoong et al, 2020; Shang et al, 2019), and the authors believe that adapting DFIV to this setting will be of great value

Summary

## Introduction:

The aim of supervised learning is to obtain a model based on samples observed from some data generating process, and to make predictions about new samples generated from the same distribution.- If the goal is to predict the effect of the actions on the world, the aim becomes to assess the influence of interventions on this data generating process.
- To answer such causal questions, a supervised learning approach is inappropriate, since the interventions, called treatments, may affect the underlying distribution of the variable of interest, which is called the outcome.
- The time of the year is a confounder, since it affects both the sales and the prices, and the authors need to correct the bias caused by it
## Objectives:

If the goal is to predict the effect of the actions on the world, the aim becomes to assess the influence of interventions on this data generating process.- The authors aim to predict the demands on airplane tickets Y given the price of the tickets P
## Methods:

The authors report the empirical performance of the DFIV method.- The evaluation considers both low and high-dimensional treatment variables.
- In the deep RL context, the authors apply DFIV to perform off-policy policy evaluation (OPE).
- The algorithms in the first two experiments are implemented using PyTorch (Paszke et al, 2019) and the OPE experiments are implemented using TensorFlow (Abadi et al, 2015) and the Acme RL framework (Hoffman et al, 2020).
## Conclusion:

The authors have proposed a novel method for instrumental variable regression, Deep Feature IV (DFIV), which performs two-stage least squares regression on flexible and expressive features of the instrument and treatment.- As a contribution to the IV literature, the authors showed how to adaptively learn these feature maps with deep neural networks.
- The authors showed that the off-policy policy evaluation (OPE) problem in deep RL can be interpreted as a nonlinear IV regression, and that DFIV performs competitively in this domain.
- In RL, problems with additional confounders are common, see e.g. (Namkoong et al, 2020; Shang et al, 2019), and the authors believe that adapting DFIV to this setting will be of great value

Reference

- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/.
- J. D. Angrist. Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records. The American Economic Review, 80(3):313–336, 1990.
- J. D. Angrist and A. B. Krueger. Split-sample instrumental variables estimates of the return to schooling. Journal of Business & Economic Statistics, 13(2):225–235, 1995.
- J. D. Angrist, G. W. Imbens, and D. B. Rubin. Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434):444–455, 1996.
- J. D. Angrist, G. W. Imbens, and A. B. Krueger. Jackknife instrumental variables estimation. Journal of Applied Econometrics, 14(1):57–67, 1999.
- L. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning, 1995.
- E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: Z-identifiability. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, page 113–120, 2012.
- A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on Systems, Man, and Cybernetics, (5):834–846, 1983.
- A. Bennett, N. Kallus, and T. Schnabel. Deep generalized method of moments for instrumental variable analysis. In Advances in Neural Information Processing Systems 32, pages 3564– 3574. 2019.
- R. Blundell, J. Horowitz, and M. Parey. Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation. Quantitative Economics, 3:29–51, 2012.
- S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1-3):33–57, 1996.
- M. Carrasco, J.-P. Florens, and E. Renault. Linear inverse problems in structural econometrics estimation based on spectral decomposition and regularization. In Handbook of Econometrics, volume 6B, chapter 77. 2007.
- X. Chen and T. M. Christensen. Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression: Nonlinear functionals of nonparametric IV. Quantitative Economics, 9:39–84, 2018.
- S. Darolles, Y. Fan, J. P. Florens, and E. Renault. Nonparametric instrumental regression. Econometrica, 79(5):1541–1565, 2011.
- D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(Apr):503–556, 2005.
- C. Hansen and D. Kozbur. Instrumental variables estimation with many weak instruments using regularized jive. Journal of Econometrics, 182(2):290–308, 2014.
- L. P. Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982.
- J. Hartford, G. Lewis, K. Leyton-Brown, and M. Taddy. Deep IV: A flexible approach for counterfactual prediction. In Proceedings of the 34th International Conference on Machine Learning, 2017.
- M. Hoffman, B. Shahriari, J. Aslanides, G. Barth-Maron, F. Behbahani, T. Norman, A. Abdolmaleki, A. Cassirer, F. Yang, K. Baumli, S. Henderson, A. Novikov, S. G. Colmenarejo, S. Cabi, C. Gulcehre, T. L. Paine, A. Cowie, Z. Wang, B. Piot, and N. de Freitas. Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020.
- H. M. Le, C. Voloshin, and Y. Yue. Batch policy learning under constraints. arXiv preprint arXiv:1903.08738, 2019.
- Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
- L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. dSprites: Disentanglement testing sprites dataset, 2017. URL https://github.com/deepmind/dsprites-dataset/.
- T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- A. W. Moore. Efficient Memory-Based Learning for Robot Control. PhD thesis, Cambridge University, 1990.
- K. Muandet, A. Mehrjou, S. K. Lee, and A. Raj. Dual IV: A single stage instrumental variable regression. arXiv preprint arXiv:1910.12358, 2019.
- H. Namkoong, R. Keramati, S. Yadlowsky, and E. Brunskill. Off-policy policy evaluation for sequential decisions under unobserved confounding. arXiv preprint arXiv:2003.05623, 2020.
- M. Z. Nashed and G. Wahba. Generalized inverses in reproducing kernel spaces: An approach to regularization of linear operator equations. SIAM Journal on Mathematical Analysis, 5 (6):974–987, 1974.
- W. K. Newey and J. L. Powell. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565–1578, 2003.
- I. Osband, Y. Doron, M. Hessel, J. Aslanides, E. Sezener, A. Saraiva, K. McKinney, T. Lattimore, C. Szepesvari, S. Singh, et al. Behaviour suite for reinforcement learning. In International Conference on Learning Representations, 2019.
- T. L. Paine, C. Paduraru, A. Michi, C. Gulcehre, K. Zolna, A. Novikov, Z. Wang, and N. de Freitas. Hyperparameter selection for offline reinforcement learning. arXiv preprint arXiv:2007.09055, 2020.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019.
- A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20, pages 1177–1184. 2008.
- W. Shang, Y. Yu, Q. Li, Z. Qin, Y. Meng, and J. Ye. Environment reconstruction with hidden confounders for reinforcement learning based recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 566–576, 2019.
- R. Singh, M. Sahani, and A. Gretton. Kernel instrumental variable regression. In Advances in Neural Information Processing Systems 32, pages 4593–4605. 2019.
- J. H. Stock and F. Trebbi. Retrospectives: Who invented instrumental variable regression? Journal of Economic Perspectives, 17(3):177–194, 2003.
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 2018.
- C. Voloshin, H. M. Le, N. Jiang, and Y. Yue. Empirical study of off-policy policy evaluation for reinforcement learning. arXiv preprint arXiv:1911.06854, 2019.
- M. Wiatrak, S. V. Albrecht, and A. Nystrom. Stabilizing generative adversarial networks: A survey. arXiv preprint arXiv:1910.00927, 2019.
- P. Wright. The Tariff on Animal and Vegetable Oils. Investigations in International Commercial Policies. Macmillan Company, 1928.

Tags

Comments