Optimizations for the EcoPod field identification tool

BMC Bioinformatics(2008)

引用 64|浏览22
暂无评分
摘要
Background We sketch our species identification tool for palm sized computers that helps knowledgeable observers with census activities. An algorithm turns an identification matrix into a minimal length series of questions that guide the operator towards identification. Historic observation data from the census geographic area helps minimize question volume. We explore how much historic data is required to boost performance, and whether the use of history negatively impacts identification of rare species. We also explore how characteristics of the matrix interact with the algorithm, and how best to predict the probability of observing a previously unseen species. Results Point counts of birds taken at Stanford University's Jasper Ridge Biological Preserve between 2000 and 2005 were used to examine the algorithm. A computer identified species by correctly answering, and counting the algorithm's questions. We also explored how the character density of the key matrix and the theoretical minimum number of questions for each bird in the matrix influenced the algorithm. Our investigation of the required probability smoothing determined whether Laplace smoothing of observation probabilities was sufficient, or whether the more complex Good-Turing technique is required. Conclusion Historic data improved identification speed, but only impacted the top 25% most frequently observed birds. For rare birds the history based algorithms did not impose a noticeable penalty in the number of questions required for identification. For our dataset neither age of the historic data, nor the number of observation years impacted the algorithm. Density of characters for different taxa in the identification matrix did not impact the algorithms. Intrinsic differences in identifying different birds did affect the algorithm, but the differences affected the baseline method of not using historic data to exactly the same degree. We found that Laplace smoothing performed better for rare species than Simple Good-Turing, and that, contrary to expectation, the technique did not then adversely affect identification performance for frequently observed birds.
更多
查看译文
关键词
Natural Language Processing,Smoothing Algorithm,Laplace Smoothing,Bird Observation,Question Tree
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要