Human-centered compression for efficient text input

Efficient Text Entry(2006)

引用 23|浏览4
暂无评分
摘要
of natural language text input under degraded conditions, for instance by disabled users or on mobile telephones. Previous approaches to this problem were based on prediction of the input text. Prediction, however, requires the user to take overt action to verify or select the system's predictions, causing an increased cognitive load that eliminates any speed advantages (Goodenough-Trepagnier, Rosen, and Galdieri, 1986). We have developed an alternative method that takes advantage of the duality between prediction and compression. Using this method, users input text in a compressed form, which the system then automatically decodes to generate the full text. Because the system's oper- ation is completely independent from the user's, the overhead from cognitive task switching and attending to the system's actions online is stet, enabling eciency improvements. Compression follows a simple rule that requires users to drop all vowels except at the beginning of a word, as well as any consecutive doubled consonants. For instance, the word "association" would be abbreviated "asctn". Since multiple words might be abbreviated to the same character sequence, we have im- plemented a disabbreviation method that searches for the most likely decoding of the sen- tence. The method is implemented as a composition of several weighted finite-state trans- ducers (Pereira and Riley, 1997). A key component of this model is a smoothed n-gram language model of the words. The model transduces word sequences, weighted according to the language model, to the corresponding abbreviated character sequences. Unseen words are handled using an n-gram model over letters. To disabbreviate a given input, we use Viterbi decoding to find the most likely path through the transducer that could have generated the abbreviated text. The system is implemented using the AT&T FSM and GRM libraries. We have conducted an automated decoding study in which we first abbreviated a held-out corpus of Wall Street Journal text of about 840,000 words, yielding a character reduction of 26.4%. We then applied the decoding procedure, and compared the decoding result with the original text, yielding the residual error rates shown for a variety of language models in Table 1. To assess the usability and the eciency
更多
查看译文
关键词
cognitive load,task switching,language model,error rate,viterbi decoder,natural language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要