CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal ProcessingXianghu Yue,Xiaohai Tian, Lu Lu,Malu Zhang,Zhizheng Wu,Haizhou LiCoRR(2024)引用 0|浏览24AI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要