pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
arxiv(2024)
摘要
Interventions on model-internal states are fundamental operations in many
areas of AI, including model editing, steering, robustness, and
interpretability. To facilitate such research, we introduce pyvene,
an open-source Python library that supports customizable interventions on a
range of different PyTorch modules. pyvene supports complex
intervention schemes with an intuitive configuration format, and its
interventions can be static or include trainable parameters. We show how
pyvene provides a unified and extensible framework for performing
interventions on neural models and sharing the intervened upon models with
others. We illustrate the power of the library via interpretability analyses
using causal abstraction and knowledge localization. We publish our library
through Python Package Index (PyPI) and provide code, documentation, and
tutorials at https://github.com/stanfordnlp/pyvene.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要