Learning Deep Features in Instrumental Variable Regression

ICLR 2021, 2021.

Cited by: 0|Bibtex|Views10|Links
Keywords:
deep featurenovel methodInstrumental variablefeature maplinear regressionMore(17+)
Weibo:
We showed that the off-policy policy evaluation problem in deep Reinforcement Learning can be interpreted as a nonlinear Instrumental variable regression, and that deep feature instrumental variable regression performs competitively in this domain

Abstract:

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables by utilizing an instrumental variable, which is conditionally independent of the outcome given the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs line...More

Code:

Data:

0
Introduction
  • The aim of supervised learning is to obtain a model based on samples observed from some data generating process, and to make predictions about new samples generated from the same distribution.
  • If the goal is to predict the effect of the actions on the world, the aim becomes to assess the influence of interventions on this data generating process.
  • To answer such causal questions, a supervised learning approach is inappropriate, since the interventions, called treatments, may affect the underlying distribution of the variable of interest, which is called the outcome.
  • The time of the year is a confounder, since it affects both the sales and the prices, and the authors need to correct the bias caused by it
Highlights
  • The aim of supervised learning is to obtain a model based on samples observed from some data generating process, and to make predictions about new samples generated from the same distribution
  • We propose Deep Feature Instrumental Variable Regression (DFIV), which aims to combine the advantages of all previous approaches, while avoiding their limitations
  • We empirically show that deep feature instrumental variable regression (DFIV) performs better than other methods on several Instrumental variable (IV) benchmarks, and apply DFIV successfully to off-policy policy evaluation, which is a fundamental problem in Reinforcement Learning (RL)
  • We have proposed a novel method for instrumental variable regression, Deep Feature IV (DFIV), which performs two-stage least squares regression on flexible and expressive features of the instrument and treatment
  • As a contribution to the IV literature, we showed how to adaptively learn these feature maps with deep neural networks
  • We showed that the off-policy policy evaluation (OPE) problem in deep RL can be interpreted as a nonlinear IV regression, and that DFIV performs competitively in this domain
Methods
  • The authors report the empirical performance of the DFIV method.
  • The evaluation considers both low and high-dimensional treatment variables.
  • In the deep RL context, the authors apply DFIV to perform off-policy policy evaluation (OPE).
  • The algorithms in the first two experiments are implemented using PyTorch (Paszke et al, 2019) and the OPE experiments are implemented using TensorFlow (Abadi et al, 2015) and the Acme RL framework (Hoffman et al, 2020).
Conclusion
  • The authors have proposed a novel method for instrumental variable regression, Deep Feature IV (DFIV), which performs two-stage least squares regression on flexible and expressive features of the instrument and treatment.
  • As a contribution to the IV literature, the authors showed how to adaptively learn these feature maps with deep neural networks.
  • The authors showed that the off-policy policy evaluation (OPE) problem in deep RL can be interpreted as a nonlinear IV regression, and that DFIV performs competitively in this domain.
  • In RL, problems with additional confounders are common, see e.g. (Namkoong et al, 2020; Shang et al, 2019), and the authors believe that adapting DFIV to this setting will be of great value
Summary
  • Introduction:

    The aim of supervised learning is to obtain a model based on samples observed from some data generating process, and to make predictions about new samples generated from the same distribution.
  • If the goal is to predict the effect of the actions on the world, the aim becomes to assess the influence of interventions on this data generating process.
  • To answer such causal questions, a supervised learning approach is inappropriate, since the interventions, called treatments, may affect the underlying distribution of the variable of interest, which is called the outcome.
  • The time of the year is a confounder, since it affects both the sales and the prices, and the authors need to correct the bias caused by it
  • Objectives:

    If the goal is to predict the effect of the actions on the world, the aim becomes to assess the influence of interventions on this data generating process.
  • The authors aim to predict the demands on airplane tickets Y given the price of the tickets P
  • Methods:

    The authors report the empirical performance of the DFIV method.
  • The evaluation considers both low and high-dimensional treatment variables.
  • In the deep RL context, the authors apply DFIV to perform off-policy policy evaluation (OPE).
  • The algorithms in the first two experiments are implemented using PyTorch (Paszke et al, 2019) and the OPE experiments are implemented using TensorFlow (Abadi et al, 2015) and the Acme RL framework (Hoffman et al, 2020).
  • Conclusion:

    The authors have proposed a novel method for instrumental variable regression, Deep Feature IV (DFIV), which performs two-stage least squares regression on flexible and expressive features of the instrument and treatment.
  • As a contribution to the IV literature, the authors showed how to adaptively learn these feature maps with deep neural networks.
  • The authors showed that the off-policy policy evaluation (OPE) problem in deep RL can be interpreted as a nonlinear IV regression, and that DFIV performs competitively in this domain.
  • In RL, problems with additional confounders are common, see e.g. (Namkoong et al, 2020; Shang et al, 2019), and the authors believe that adapting DFIV to this setting will be of great value
Reference
  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/.
    Findings
  • J. D. Angrist. Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records. The American Economic Review, 80(3):313–336, 1990.
    Google ScholarLocate open access versionFindings
  • J. D. Angrist and A. B. Krueger. Split-sample instrumental variables estimates of the return to schooling. Journal of Business & Economic Statistics, 13(2):225–235, 1995.
    Google ScholarLocate open access versionFindings
  • J. D. Angrist, G. W. Imbens, and D. B. Rubin. Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434):444–455, 1996.
    Google ScholarLocate open access versionFindings
  • J. D. Angrist, G. W. Imbens, and A. B. Krueger. Jackknife instrumental variables estimation. Journal of Applied Econometrics, 14(1):57–67, 1999.
    Google ScholarLocate open access versionFindings
  • L. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning, 1995.
    Google ScholarLocate open access versionFindings
  • E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: Z-identifiability. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, page 113–120, 2012.
    Google ScholarLocate open access versionFindings
  • A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on Systems, Man, and Cybernetics, (5):834–846, 1983.
    Google ScholarLocate open access versionFindings
  • A. Bennett, N. Kallus, and T. Schnabel. Deep generalized method of moments for instrumental variable analysis. In Advances in Neural Information Processing Systems 32, pages 3564– 3574. 2019.
    Google ScholarLocate open access versionFindings
  • R. Blundell, J. Horowitz, and M. Parey. Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation. Quantitative Economics, 3:29–51, 2012.
    Google ScholarLocate open access versionFindings
  • S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1-3):33–57, 1996.
    Google ScholarLocate open access versionFindings
  • M. Carrasco, J.-P. Florens, and E. Renault. Linear inverse problems in structural econometrics estimation based on spectral decomposition and regularization. In Handbook of Econometrics, volume 6B, chapter 77. 2007.
    Google ScholarLocate open access versionFindings
  • X. Chen and T. M. Christensen. Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression: Nonlinear functionals of nonparametric IV. Quantitative Economics, 9:39–84, 2018.
    Google ScholarLocate open access versionFindings
  • S. Darolles, Y. Fan, J. P. Florens, and E. Renault. Nonparametric instrumental regression. Econometrica, 79(5):1541–1565, 2011.
    Google ScholarLocate open access versionFindings
  • D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(Apr):503–556, 2005.
    Google ScholarLocate open access versionFindings
  • C. Hansen and D. Kozbur. Instrumental variables estimation with many weak instruments using regularized jive. Journal of Econometrics, 182(2):290–308, 2014.
    Google ScholarLocate open access versionFindings
  • L. P. Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982.
    Google ScholarLocate open access versionFindings
  • J. Hartford, G. Lewis, K. Leyton-Brown, and M. Taddy. Deep IV: A flexible approach for counterfactual prediction. In Proceedings of the 34th International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • M. Hoffman, B. Shahriari, J. Aslanides, G. Barth-Maron, F. Behbahani, T. Norman, A. Abdolmaleki, A. Cassirer, F. Yang, K. Baumli, S. Henderson, A. Novikov, S. G. Colmenarejo, S. Cabi, C. Gulcehre, T. L. Paine, A. Cowie, Z. Wang, B. Piot, and N. de Freitas. Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020.
    Findings
  • H. M. Le, C. Voloshin, and Y. Yue. Batch policy learning under constraints. arXiv preprint arXiv:1903.08738, 2019.
    Findings
  • Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
    Findings
  • L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. dSprites: Disentanglement testing sprites dataset, 2017. URL https://github.com/deepmind/dsprites-dataset/.
    Findings
  • T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • A. W. Moore. Efficient Memory-Based Learning for Robot Control. PhD thesis, Cambridge University, 1990.
    Google ScholarFindings
  • K. Muandet, A. Mehrjou, S. K. Lee, and A. Raj. Dual IV: A single stage instrumental variable regression. arXiv preprint arXiv:1910.12358, 2019.
    Findings
  • H. Namkoong, R. Keramati, S. Yadlowsky, and E. Brunskill. Off-policy policy evaluation for sequential decisions under unobserved confounding. arXiv preprint arXiv:2003.05623, 2020.
    Findings
  • M. Z. Nashed and G. Wahba. Generalized inverses in reproducing kernel spaces: An approach to regularization of linear operator equations. SIAM Journal on Mathematical Analysis, 5 (6):974–987, 1974.
    Google ScholarLocate open access versionFindings
  • W. K. Newey and J. L. Powell. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565–1578, 2003.
    Google ScholarLocate open access versionFindings
  • I. Osband, Y. Doron, M. Hessel, J. Aslanides, E. Sezener, A. Saraiva, K. McKinney, T. Lattimore, C. Szepesvari, S. Singh, et al. Behaviour suite for reinforcement learning. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • T. L. Paine, C. Paduraru, A. Michi, C. Gulcehre, K. Zolna, A. Novikov, Z. Wang, and N. de Freitas. Hyperparameter selection for offline reinforcement learning. arXiv preprint arXiv:2007.09055, 2020.
    Findings
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019.
    Google ScholarLocate open access versionFindings
  • A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20, pages 1177–1184. 2008.
    Google ScholarLocate open access versionFindings
  • W. Shang, Y. Yu, Q. Li, Z. Qin, Y. Meng, and J. Ye. Environment reconstruction with hidden confounders for reinforcement learning based recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 566–576, 2019.
    Google ScholarLocate open access versionFindings
  • R. Singh, M. Sahani, and A. Gretton. Kernel instrumental variable regression. In Advances in Neural Information Processing Systems 32, pages 4593–4605. 2019.
    Google ScholarLocate open access versionFindings
  • J. H. Stock and F. Trebbi. Retrospectives: Who invented instrumental variable regression? Journal of Economic Perspectives, 17(3):177–194, 2003.
    Google ScholarLocate open access versionFindings
  • R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 2018.
    Google ScholarFindings
  • C. Voloshin, H. M. Le, N. Jiang, and Y. Yue. Empirical study of off-policy policy evaluation for reinforcement learning. arXiv preprint arXiv:1911.06854, 2019.
    Findings
  • M. Wiatrak, S. V. Albrecht, and A. Nystrom. Stabilizing generative adversarial networks: A survey. arXiv preprint arXiv:1910.00927, 2019.
    Findings
  • P. Wright. The Tariff on Animal and Vegetable Oils. Investigations in International Commercial Policies. Macmillan Company, 1928.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments