' that allows the reversal of operators. We demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \\cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices, and without rewriting any code, the software carries through to completion giving the correct answer. ","authors":[{"id":"560286ad45cedb3395fc699c","name":"Alan Edelman"},{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyurek"},{"id":"542d3fe4dabfae12b980055a","name":"Yuyang Wang"}],"create_time":"2023-03-29T19:32:07.011Z","id":"6423ac6c90e50fcafd55c06f","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FFE\u002F34\u002F9B\u002FFE349B268D3212D71089D926325B4C03.pdf","title":"BACKpropagation through BACK substitution with a BACKslash","urls":["db\u002Fjournals\u002Fcorr\u002Fcorr2303.html#abs-2303-15449","https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2303.15449","https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15449"],"venue":{"info":{"name":"CoRR"},"volume":"abs\u002F2303.15449"},"versions":[{"id":"6423ac6c90e50fcafd55c06f","sid":"2303.15449","src":"arxiv","year":2023},{"id":"6456477dd68f896efae21b17","sid":"journals\u002Fcorr\u002Fabs-2303-15449","src":"dblp","year":2023}],"year":2023},{"abstract":"Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding context-specific parametric models in their hidden representations, and updating these implicit models as new examples appear in the context. Using linear regression as a model problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form computation of regression parameters. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may work by rediscovering standard estimation algorithms.","authors":[{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek"},{"id":"64fad981bb3ef99d3528b551","name":"Jacob Andreas"},{"id":"53f4694ddabfaeb1a7c989d2","name":"Dale Schuurmans"},{"id":"53f46708dabfaec09f242380","name":"Tengyu Ma"},{"id":"53f4334fdabfaedce550f474","name":"Denny Zhou"}],"create_time":"2023-02-03T10:23:19.282Z","hashs":{"h1":"​lail","h3":"ilm"},"id":"63dcdb422c26941cf00b6052","keywords":["in-context learning","transformers","sequence models","deep learning","meta learning"],"num_citation":0,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1129\u002F712\u002F1515\u002F63dcdb422c26941cf00b6052_0.pdf","pdf_src":["https:\u002F\u002Fopenreview.net\u002Fpdf?id=0g0X4H8yN4I"],"title":"​​What learning algorithm is in-context learning? Investigations with linear models","urls":["https:\u002F\u002Fopenreview.net\u002Fforum?id=0g0X4H8yN4I"],"venue":{"info":{"name":"ICLR 2023"}},"versions":[{"id":"63dcdb422c26941cf00b6052","sid":"0g0X4H8yN4I","src":"conf_iclr","vsid":"ICLR.cc\u002F2023\u002FConference","year":2023}],"year":2023},{"abstract":"In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets. Many existing approaches overcome this limitation with model architectures that enforce a compositional process of sentence interpretation. In this paper, we present a domain-general and model-agnostic formulation of compositionality as a constraint on symmetries of data distributions rather than models. Informally, we prove that whenever a task can be solved by a compositional model, there is a corresponding data augmentation scheme — a procedure for transforming examples into other well-formed examples — that imparts compositional inductive bias on any model trained to solve the same task. We describe a procedure called LexSym that discovers these transformations automatically, then applies them to training data for ordinary neural sequence models. Unlike existing compositional data augmentation procedures, LexSym can be deployed agnostically across text, structured data, and even images. It matches or surpasses state-of-the-art, task-specific models on COGS semantic parsing, SCAN and Alchemy instruction following, and CLEVR-CoGenT visual question answering datasets.","authors":[{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyurek"},{"id":"53f42c2edabfaedf43504560","name":"Jacob Andreas"}],"create_time":"2023-07-13T03:42:51.748Z","hashs":{"h1":"lcls"},"id":"64ae66ff3fda6d7f0684b89c","num_citation":1,"pages":{"end":"657","start":"639"},"title":"LexSym: Compositionality as Lexical Symmetry","update_times":{"u_c_t":"2023-10-24T07:09:06.37Z"},"urls":["db\u002Fconf\u002Facl\u002Facl2023-1.html#AkyurekA23","https:\u002F\u002Faclanthology.org\u002F2023.acl-long.38","https:\u002F\u002Faclanthology.org\u002F2023.acl-long.38\u002F"],"venue":{"info":{"name":"conf_acl"},"volume":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)"},"venue_hhb_id":"5ea1afddedb6e7d53c00c104","versions":[{"id":"64ae66ff3fda6d7f0684b89c","sid":"2023.acl-long.38","src":"conf_acl","year":2023},{"id":"64c78b9f3fda6d7f06db9e28","sid":"conf\u002Facl\u002FAkyurekA23","src":"dblp","year":2023}],"year":2023},{"abstract":" Despite their unprecedented success, even the largest language models make mistakes. Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply to black-box or limited access models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of large general-purpose language agents, fine-tuning is neither computationally nor spatially efficient as it results in multiple copies of the network. In this work, we introduce RL4F (Reinforcement Learning for Feedback), a multi-agent collaborative framework where the critique generator is trained to maximize end-task performance of GPT-3, a fixed model more than 200 times its size. RL4F produces critiques that help GPT-3 revise its outputs. We study three datasets for action planning, summarization and alphabetization and show improvements (~5% on average) in multiple text similarity metrics over strong baselines across all three tasks. ","authors":[{"id":"64b916a016b3d9192137baa3","name":"Afra Feyza Akyürek"},{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek"},{"name":"Aman Madaan"},{"id":"63724c37ec88d95668cc9755","name":"Ashwin Kalyan"},{"name":"Peter Clark"},{"id":"5601bd5c45cedb3395eab6a6","name":"Derry Wijaya"},{"id":"53f48030dabfaec09f29ea31","name":"Niket Tandon"}],"create_time":"2023-05-16T04:58:21.706Z","hashs":{"h1":"rgnlf","h3":"rlrmo"},"id":"6462f13cd68f896efa911ee9","num_citation":0,"pages":{"end":"7733","start":"7716"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FEC\u002F59\u002F48\u002FEC5948C9B9DC115DA06411A4603494FB.pdf","title":"RL4F: Generating Natural Language Feedback with Reinforcement Learning\n for Repairing Model Outputs","update_times":{"u_c_t":"2023-10-24T07:03:05.3Z"},"urls":["db\u002Fconf\u002Facl\u002Facl2023-1.html#AkyurekAKCWT23","https:\u002F\u002Faclanthology.org\u002F2023.acl-long.427","https:\u002F\u002Faclanthology.org\u002F2023.acl-long.427\u002F","db\u002Fjournals\u002Fcorr\u002Fcorr2305.html#abs-2305-08844","https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.08844","https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.08844"],"venue":{"info":{"name":"conf_acl"},"volume":"abs\u002F2305.08844"},"venue_hhb_id":"5ea1afddedb6e7d53c00c104","versions":[{"id":"6462f13cd68f896efa911ee9","sid":"2305.08844","src":"arxiv","year":2023},{"id":"6479e3add68f896efa4e705b","sid":"journals\u002Fcorr\u002Fabs-2305-08844","src":"dblp","year":2023},{"id":"64ae66e53fda6d7f06849331","sid":"2023.acl-long.427","src":"conf_acl","year":2023},{"id":"64c78b9f3fda6d7f06db9a3d","sid":"conf\u002Facl\u002FAkyurekAKCWT23","src":"dblp","year":2023}],"year":2023},{"abstract":" Language models (LMs) have been shown to memorize a great deal of factual knowledge contained in their training data. But when an LM generates an assertion, it is often difficult to determine where it learned this information and whether it is true. In this paper, we propose the problem of fact tracing: identifying which training examples taught an LM to generate a particular factual assertion. Prior work on training data attribution (TDA) may offer effective tools for identifying such examples, known as \"proponents\". We present the first quantitative benchmark to evaluate this. We compare two popular families of TDA methods -- gradient-based and embedding-based -- and find that much headroom remains. For example, both methods have lower proponent-retrieval precision than an information retrieval baseline (BM25) that does not have access to the LM at all. We identify key challenges that may be necessary for further improvement such as overcoming the problem of gradient saturation, and also show how several nuanced implementation details of existing neural TDA methods can significantly improve overall fact tracing performance. ","authors":[{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek"},{"id":"562d24f045cedb3398d667b5","name":"Tolga Bolukbasi"},{"id":"562d26fa45cedb3398d6ac11","name":"Frederick Liu"},{"id":"652e6e7250dee4c4226dfeab","name":"Binbin Xiong"},{"name":"Ian Tenney"},{"id":"64fad981bb3ef99d3528b551","name":"Jacob Andreas"},{"name":"Kelvin Guu"}],"create_time":"2022-05-24T13:48:36.958Z","hashs":{"h1":"tklmb","h3":"td"},"id":"628c4ce65aee126c0ff59f82","lang":"en","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F98\u002F6B\u002F1D\u002F986B1D45873BF69E8B32A937B65FE616.pdf","pdf_src":["https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.11482"],"title":"Towards Tracing Factual Knowledge in Language Models Back to the\n Training Data","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.11482"],"versions":[{"id":"628c4ce65aee126c0ff59f82","sid":"2205.11482","src":"arxiv","year":2022}],"year":2022},{"abstract":" Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at https:\u002F\u002Fgithub.com\u002Fekinakyurek\u002Fgoogle-research\u002Fblob\u002Fmaster\u002Fincontext. ","authors":[{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek","org":"Massachusetts Institute of Technology","orgid":"62331e350a6eb147dca8a7ec","orgs":["Massachusetts Institute of Technology"]},{"id":"53f4694ddabfaeb1a7c989d2","name":"Dale Schuurmans","org":"University of Alberta","orgid":"5f71b2941c455f439fe3cd7c","orgs":["University of Alberta"]},{"id":"53f42c2edabfaedf43504560","name":"Jacob Andreas","org":"Massachusetts Institute of Technology","orgid":"62331e350a6eb147dca8a7ec","orgs":["Massachusetts Institute of Technology"]},{"id":"53f46708dabfaec09f242380","name":"Tengyu Ma","org":"Stanford University","orgid":"62331e330a6eb147dca8a6e8","orgs":["Stanford University"]},{"id":"53f4334fdabfaedce550f474","name":"Denny Zhou","org":"Google Brain","orgid":"5f71b2d21c455f439fe3e823","orgs":["Google Brain"]}],"citations":{"google_citation":0,"last_citation":0},"create_time":"2022-11-29T05:04:43.293Z","hashs":{"h1":"laili","h3":"lm"},"id":"6385789190e50fcafdf4c6be","keywords":["in-context learning","transformers","sequence models","deep learning","meta learning"],"lang":"en","num_citation":99,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FA7\u002FC8\u002F22\u002FA7C822733C120428745609F5B04D5E5F.pdf","pdf_src":["https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.15661"],"title":"What learning algorithm is in-context learning? Investigations with\n linear models","update_times":{"u_a_t":"2022-11-30T14:54:51.759Z","u_c_t":"2023-10-24T06:46:44.567Z"},"urls":["db\u002Fconf\u002Ficlr\u002Ficlr2023.html#AkyurekSA0Z23","https:\u002F\u002Fopenreview.net\u002Fpdf?id=0g0X4H8yN4I","https:\u002F\u002Fopenreview.net\u002Fforum?id=0g0X4H8yN4I","https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.15661"],"versions":[{"id":"6385789190e50fcafdf4c6be","sid":"2211.15661","src":"arxiv","year":2022},{"id":"63dcdb422c26941cf00b6052","sid":"0g0X4H8yN4I","src":"conf_iclr","vsid":"ICLR.cc\u002F2023\u002FConference","year":2023},{"id":"6433f61f90e50fcafd6c035a","sid":"2023#0g0X4H8yN4I","src":"conf_iclr","vsid":"ICLR.cc\u002F2023\u002FConference","year":2023},{"id":"64a407ddd68f896efaf1cbb4","sid":"conf\u002Ficlr\u002FAkyurekSA0Z23","src":"dblp","year":2023}],"year":2022},{"authors":[{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek"},{"id":"562d24f045cedb3398d667b5","name":"Tolga Bolukbasi"},{"id":"562d26fa45cedb3398d6ac11","name":"Frederick Liu"},{"id":"652e6e7250dee4c4226dfeab","name":"Binbin Xiong"},{"id":"6176c0fb60a96543aa816ee1","name":"Ian Tenney"},{"id":"53f42c2edabfaedf43504560","name":"Jacob Andreas"},{"id":"61770f8960a96543aa816f7b","name":"Kelvin Guu"}],"create_time":"2023-04-07T15:35:21.079Z","hashs":{"h1":"tklmb","h3":"td"},"id":"6426ed4d90e50fcafd44a50d","num_citation":4,"pages":{"end":"2446","start":"2429"},"title":"Towards Tracing Knowledge in Language Models Back to the Training Data.","update_times":{"u_c_t":"2023-07-18T07:00:22.419Z","u_v_t":"2023-04-15T03:19:50.733Z"},"urls":["db\u002Fconf\u002Femnlp\u002Femnlp2022f.html#AkyurekBLXTAG22","https:\u002F\u002Faclanthology.org\u002F2022.findings-emnlp.180"],"venue":{"info":{"name":"EMNLP (Findings)"}},"venue_hhb_id":"5eba7087edb6e7d53c1009a5","versions":[{"id":"6426ed4d90e50fcafd44a50d","sid":"conf\u002Femnlp\u002FAkyurekBLXTAG22","src":"dblp","vsid":"conf\u002Femnlp","year":2022}],"year":2022},{"abstract":" Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications. ","authors":[{"id":"63817fed18c6797fc6904d39","name":"Andrew Drozdov","org":"Department of Computer Science, University of Massachusetts, Amherst","orgid":"5f71b2951c455f439fe3cdb3","orgs":["Department of Computer Science, University of Massachusetts, Amherst"]},{"email":"schaerli@google.com","id":"53f4ce67dabfaeedd977bce2","name":"Nathanael Schärli"},{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek","org":"Massachusetts Institute of Technology","orgid":"62331e350a6eb147dca8a7ec","orgs":["Massachusetts Institute of Technology"]},{"id":"63738b639bb5705eda8b0cfc","name":"Nathan Scales","org":"Google","orgid":"5f71b2d21c455f439fe3e823","orgs":["Google"]},{"id":"53f42fd5dabfaee2a1c9b857","name":"Xinying Song","org":"Google","orgid":"5f71b2d21c455f439fe3e823","orgs":["Google"]},{"id":"542e4c9fdabfaed7c7c30ea1","name":"Xinyun Chen","org":"Google","orgid":"5f71b2d21c455f439fe3e823","orgs":["Google"]},{"id":"53f48de7dabfaea7cd1d4cd2","name":"Olivier Bousquet","org":"Google","orgid":"5f71b2d21c455f439fe3e823","orgs":["Google"]},{"id":"53f4334fdabfaedce550f474","name":"Denny Zhou","org":"Google Brain","orgid":"5f71b2d21c455f439fe3e823","orgs":["Google Brain"]}],"citations":{"google_citation":4,"last_citation":4},"create_time":"2022-09-30T04:53:54.919Z","hashs":{"h1":"cspll","h3":"m"},"id":"63365e7f90e50fcafd1a3626","keywords":["large language models","prompting","compositional generalization","natural language processing"],"lang":"en","num_citation":49,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FB4\u002FBB\u002F41\u002FB4BB41F0EF05FD25FDB942B665DA1C49.pdf","pdf_src":["https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.15003","https:\u002F\u002Fopenreview.net\u002Fpdf?id=gJW8hSGBys8","https:\u002F\u002Fopenreview.net\u002Fpdf\u002F668ef1e66f349e87c8948f0e5e5984608ebef31d.pdf"],"title":"Compositional Semantic Parsing with Large Language Models","update_times":{"u_a_t":"2022-10-01T08:23:40.258Z","u_c_t":"2023-12-03T17:21:48.721Z","u_v_t":"2023-04-11T03:03:15.404Z"},"urls":["db\u002Fconf\u002Ficlr\u002Ficlr2023.html#DrozdovSASSCBZ23","https:\u002F\u002Fopenreview.net\u002Fpdf?id=gJW8hSGBys8","https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.15003","https:\u002F\u002Fopenreview.net\u002Fforum?id=gJW8hSGBys8"],"venue":{"info":{"name":"ICLR 2023"}},"versions":[{"id":"63365e7f90e50fcafd1a3626","sid":"2209.15003","src":"arxiv","year":2022},{"id":"63dcdb422c26941cf00b6451","sid":"gJW8hSGBys8","src":"conf_iclr","vsid":"ICLR.cc\u002F2023\u002FConference","year":2023},{"id":"6433f64790e50fcafd6ca384","sid":"2023#gJW8hSGBys8","src":"conf_iclr","vsid":"ICLR.cc\u002F2023\u002FConference","year":2023},{"id":"64a407ddd68f896efaf1cb82","sid":"conf\u002Ficlr\u002FDrozdovSASSCBZ23","src":"dblp","year":2023}],"year":2022},{"abstract":" Standard deep network models lack the inductive biases needed to generalize compositionally in tasks like semantic parsing, translation, and question answering. A large body of work in natural language processing seeks to overcome this limitation with new model architectures that enforce a compositional process of sentence interpretation. In this paper, we present a domain-general framework for compositional modeling that instead formulates compositionality as a constraint on data distributions. We prove that for any task factorizable into a lexicon and a composition function, there exists a family of data transformation functions that are guaranteed to produce new, well-formed examples when applied to training data. We further show that it is possible to identify these data transformations even when the composition function is unknown (e.g. when we do not know how to write or infer a symbolic grammar). Using these transformation functions to perform data augmentation for ordinary RNN and transformer sequence models, we obtain state-of-the-art results on the CLEVR-CoGenT visual question answering dataset, and results comparable to specialized model architectures on the COGS semantic parsing dataset. ","authors":[{"id":"64352fa0f2699869fc1e19f1","name":"Ekin Akyürek"},{"id":"64fad981bb3ef99d3528b551","name":"Jacob Andreas"}],"create_time":"2022-02-01T13:46:17.778Z","hashs":{"h1":"-compositionality lexical symmetry"},"id":"61f8a4c35aee126c0fee0472","lang":"en","num_citation":0,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F22\u002F2201\u002F2201.12926.pdf","pdf_src":["https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.12926"],"title":"Compositionality as Lexical Symmetry","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.12926"],"versions":[{"id":"61f8a4c35aee126c0fee0472","sid":"2201.12926","src":"arxiv","year":2022}],"year":2022}],"profilePubsTotal":17,"profilePatentsPage":0,"profilePatents":null,"profilePatentsTotal":null,"profilePatentsEnd":false,"profileProjectsPage":1,"profileProjects":{"success":true,"msg":"","data":null,"log_id":"2ZCKQ3ndr4CNZ9AJy0T3KYW4XKt"},"profileProjectsTotal":0,"newInfo":null,"checkDelPubs":[]}};