Multi-Lingual DALL-E Storytime

2023 IEEE Integrated STEM Education Conference (ISEC)(2023)

引用 0|浏览1
暂无评分
摘要
Visualizations are a vital tool in the process of education, playing a critical role in helping individuals comprehend and retain information. With the recent advancements in artificial intelligence and automatic visualization tools, such as OpenAI’s DALL-E, the ability to generate images based on text prompts has been greatly improved. However, a major drawback of the majority of text-to-image tools is their limited ability to create a series of consecutive coherent frames that tell a story or illustrate a process that changes over time. Rather, they are limited to producing only a few isolated images based on the input prompt. Furthermore, these existing text-to-image tools present an added challenge for populations with limited proficiency in the English language. This serves to widen the educational divide between children from diverse backgrounds and restricts their access to innovative technology. Here, we introduce a DALL-E storytelling framework designed to facilitate the fast and coherent visualization of non-English songs, stories, and biblical texts. Our framework extends the original DALL-E model to handle non-English input and allows users to specify constraints on story elements, such as a specific location or context. The key advantage of our framework over manual editing of DALL-E images is that it offers a more seamless and intuitive experience for the user, as well as automates the process, thus eliminating the time-consuming and technical-expertise-requiring manual editing process. The visualization masks are automatically adjusted to form a coherent story, ensuring that the figures and objects in each frame are consistent and maintain their meaning throughout the visualization, allowing for a much smoother experience for the viewer. Our results demonstrate that our framework is capable of effectively and quickly visualizing stories in a coherent way, conveying changes in the plot over time, and creating a narrative with a consistent style throughout the visualization. By enabling the visualization of non-English texts, our framework helps bridge the gap between populations and promotes equal access to technology and education, particularly for children and individuals who struggle with understanding complex narrative texts, such as fast-paced songs and biblical stories. This has the potential to significantly enhance literacy and foster a deeper understanding of texts.
更多
查看译文
关键词
AI,diversity,education,storytelling,visualization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要