AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We present here problems, related bounds and references for more interested readers

Cache-oblivious algorithms

CIAC, pp.5-5, (2003)

Cited by: 35|Views97
EI

Abstract

Computers with multiple levels of caching have traditionally required techniques such as data blocking in order for algorithms to exploit the cache hierarchy effectively. These "cache-aware" algorithms must be properly tuned to achieve good performance using so-called "voodoo" parameters which depend on hardware properties, such as cache ...More

Code:

Data:

0
Introduction
  • The cache oblivious model is a simple and elegant model to design algorithms that perform well in hierarchical memory models ubiquitous on current systems.
  • In section 7, a theoretically optimal, randomized cache oblivious sorting algorithm along with the running times of an implementation is presented.
  • Strassen’s matrix multiplication, quicksort, mergesort, closest pair [16], convex hulls [7], median selection [16] are all algorithms that are cache oblivious, though not all of them are optimal in this model.
Highlights
  • The cache oblivious model is a simple and elegant model to design algorithms that perform well in hierarchical memory models ubiquitous on current systems
  • Chapter Outline: We introduce the cache oblivious model in section 2
  • We study the cache oblivious analysis of Strassen’s algorithm in section 5
  • The code written for this experimentation is below 300 lines The experiment reported in Figure 5 were done on a Itanium dual processor system with 2Gb Random Access Model (RAM). (Only one processor was being used)
  • We present here problems, related bounds and references for more interested readers
  • Note that in the table, sort() and scan() denote the number of cache misses of scan and sorting functions done by an optimal cache oblivious implementation
Results
  • The sorting lower bound in the cache oblivious model is the same as the external memory model (See Chapter ??).
  • Before the authors go into the divide and conquer based algorithm for matrix transposition that is cache oblivious, lets see an experimental results.
  • Remark: Figure 2 shows the effect of using blocked cache oblivious algorithm for matrix transposition.
  • It is easy to code, uses the fact that the memory consists of a cache hierarchy, and could be exploited to speed up tree based search structures on most current machines.
  • Before one makes his hand “dirty” with implementing an algorithm in the cache oblivious or the external memory model, one should be aware of practical things that might
  • The authors list a few practical glitches that are shared by both the cache oblivious and the external memory model.
  • Code written and algorithms designed keeping the following things in mind, could be a lot faster than just directly coding an algorithm that is optimal in either the cache oblivious or the external memory model.
  • One can overcome this problem by writing one’s own paging system over the OS to do experimentation of cache oblivious algorithms on huge data sizes.
  • The authors' major conclusion are as follows: Limited associativity in the mapping from main memory addresses to cache sets can significantly degrade running time; the limited number of TLB entries can lead to thrashing; the fanciest optimal algorithms are not competitive on real machines even at fairly large problem sizes unless cache miss penalties are quite high; low level performance tuning “hacks”, such as register tiling and array alignment, can significantly distort the effect of improved algorithms, ...
Conclusion
  • (Toy experiments comparing quicksort with a modified funnelsort or distribution sort don’t count!) Currently the only impressive code that might back up ”practicality” claims of cache oblivious algorithms is FFTW [18].
  • Matrix multiplication and transposition using blocked cache oblivious algorithms do fairly well in comparison with cache aware/external memory algorithms.
  • For matrix transposition, there are at least two cache oblivious algorithms coded in [13].
Funding
  • The author is partially supported by NSF (CCR-9732220, CCR-0098172) and by a grant from Sandia National Labs
Reference
  • A. Aggarwal, B. Alpern, A. K. Chandra, and M. Snir. A model for hierarchical memory. In Proc. 19th Annu. ACM Sympos. Theory Comput., pages 305–313, 1987.
    Google ScholarLocate open access versionFindings
  • A. Aggarwal and A. K. Chandra. Virtual memory algorithms. In Proc. 20th Annu. ACM Sympos. Theory Comput., pages 173–185, 1988.
    Google ScholarLocate open access versionFindings
  • A. Aggarwal, A. K. Chandra, and M. Snir. Hierarchical memory with block transfer. In Proc. 28rd Annu. IEEE Sympos. Found. Comput. Sci., pages 204–216, 1987.
    Google ScholarLocate open access versionFindings
  • A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31:1116–1127, 1988.
    Google ScholarLocate open access versionFindings
  • B. Alpern, L. Carter, and E. Feig. Uniform memory hierarchies. In focs, pages 600–608, 1990.
    Google ScholarLocate open access versionFindings
  • B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. Algorithmica, 12(2-3), 1994.
    Google ScholarLocate open access versionFindings
  • N. M. Amato and Edgar A. Ramos. On computing Voronoi diagrams by divideprune-and-conquer. In Proc. 12th Annu. ACM Sympos. Comput. Geom., pages 166–175, 1996.
    Google ScholarLocate open access versionFindings
  • L. Arge, M. A. Bender, E. D. Demaine, B. Holland-Minkley, and J. I. Munro. Cache-oblivious priority queue and graph algorithm applications. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 268–276, 2002.
    Google ScholarLocate open access versionFindings
  • M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cache-oblivious dynamic dictionary. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 29–38, 2002.
    Google ScholarLocate open access versionFindings
  • G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. Greg Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the connection machine CM-2. In ACM Symposium on Parallel Algorithms and Architectures, pages 3–16, 1991.
    Google ScholarLocate open access versionFindings
  • G. S. Brodal and R. Fagerberg. Funnel heap - a cache oblivious priority queue. In Proc. 13th Annual International Symposium on Algorithms and Computation, Lecture Notes in Computer Science. 2002.
    Google ScholarLocate open access versionFindings
  • G. S. Brodal, R. Fagerberg, and R. Jacob. Cache oblivious search trees via binary trees of small height. Technical Report BRICS-RS-01-36, BRICS, Department of Computer Science, University of Aarhus, October 2001.
    Google ScholarFindings
  • S. Chatterjee and S. Sen. Cache-efficient matrix transposition. In HPCA, pages 195–205, 2000.
    Google ScholarLocate open access versionFindings
  • Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In Proc. 6th ACM-SIAM Sympos. Discrete Algorithms, pages 139–149, 1995.
    Google ScholarLocate open access versionFindings
  • D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progression. Journal of Symbolic Computation, 9:251–280, 1990.
    Google ScholarLocate open access versionFindings
  • T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.
    Google ScholarFindings
  • N. Eiron, M. Rodeh, and I. Steinwarts. Matrix multiplication: A case study of algorithm engineering. In 2nd Workshop on Algorithm Engineering, volume 16, pages 98–109, 1998.
    Google ScholarLocate open access versionFindings
  • M. Frigo. A fast fourier transform compiler. In PLDI’99 — Conference on Programming Language Design and Implementation, Atlanta, GA, 1999.
    Google ScholarLocate open access versionFindings
  • M. Frigo. Portable high-performance programs. Technical Report MIT/LCS/TR785, 1999.
    Google ScholarFindings
  • M. Frigo, Charles E. Leiserson, H. Prokop, and S. Ramachandran. Cache oblivious algorithms. In Proc. 40th Annual Symposium on Foundations of Computer Science, October 1999.
    Google ScholarLocate open access versionFindings
  • R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics. AddisonWesley, Reading, MA, 1989.
    Google ScholarFindings
  • J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.
    Google ScholarFindings
  • J. W. Hong and H. T. Kung. I/o complexity: The red-blue pebble game. In stoc, pages 326–333, 1981.
    Google ScholarLocate open access versionFindings
  • R. E. Ladner, R. Fortna, and B. H. Nguyen. A comparison of cache aware and cache oblivious static search trees using program instrumentation. In To appear in LNCS volume devoted to Experimental Algorithmics, April 2002.
    Google ScholarLocate open access versionFindings
  • A. LaMarca and R.E. Ladner. The influence of caches on the performance of sorting. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 370–379, 5–7 January 1997.
    Google ScholarLocate open access versionFindings
  • C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D. B. Lomet. Alphasort: A risc machine sort. In R. T. Snodgrass and M. Winslett, editors, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994, pages 233–242. ACM Press, 1994.
    Google ScholarLocate open access versionFindings
  • N. Rahman, R. Cole, and R. Raman. Optimized predecessor data structures for internal memory. In 5th Workshop on Algorithms Engineering (WAE), 2001.
    Google ScholarLocate open access versionFindings
  • J. E. Savage. Extending the Hong-Kung model to memory hierachies. In Proceedings of the 1st Annual International Conference on Computing and Combinatorics, volume 959 of LNCS, pages 270–281, August 1995.
    Google ScholarLocate open access versionFindings
  • S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 829–838, January 2000.
    Google ScholarLocate open access versionFindings
  • D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and paging rules. Commun. ACM, 28:202–208, 1985.
    Google ScholarLocate open access versionFindings
  • V. Strassen. Gaussian elimination is not optimal. Numer Math, 13:354–356, 1969.
    Google ScholarLocate open access versionFindings
  • O. Temam, C. Fricker, and William Jalby. Cache interference phenomena. In Measurement and Modeling of Computer Systems, pages 261–271, 1994. Journal on Matrix Analysis and Applications, 18(4):1065–1081, October 1997.
    Google ScholarLocate open access versionFindings
  • 34. D. S. Wise. Ahnentafel indexing into morton-ordered arrays, or matrix locality for free. In Euro-Par 2000 – Parallel Processing, volume 1900 of LNCS, pages 774–784, August 2000.
    Google ScholarLocate open access versionFindings
  • 35. Q. Yi, V. Advi, and K. Kennedy. Transforming loops to recursion for multilevel memory hierarchies. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 169–181, Vancouver, Canada, June 2000. ACM.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科