BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)(2022)

引用 3|浏览2
Genome analysis benefits precise medical care, wildlife conservation, pandemic treatment (e.g., COVID-19), and so on. Unfortunately, in genome analysis, the speed of data processing lags far behind the speed of data generation. Thus, hardware acceleration turns out to be necessary. As many applications in genome analysis are memory-bound, Processing-In-Memory (PIM) and Near-Data-Processing (NDP) solutions have been explored to tackle this problem. In particular, the Dual-Inline-Memory-Module (DIMM) based designs are very promising due to their non-invasive feature to the cost-sensitive DRAM dies. However, they have two critical limitations, i.e., performance bottle-necked by communication and the limited potential for memory expansion. In this paper, we address these two limitations by designing novel DIMM based accelerators located near the dis-aggregated memory pool with the support from the Compute Express Link (CXL), aiming to leverage the abundant memory within the memory pool and the high communication bandwidth provided by CXL. We propose BEACON, Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support. BEAC-ON ad-opts a software-hardware co-design approach to tackle the above two limitations. The BEACON architecture builds the foundation for efficient communication and memory expansion by reducing data movement and leveraging the high communication bandwidth provided by CXL. Based on the BEACON architecture, we propose a memory management framework to enable memory expansion with unmodified CXL-DIMMs and further optimize communication by improving data locality. We also propose algorithm-specific optimizations to further boost the performance of BEACON. In addition, BEACON provides two design choices, i.e., BEACON- D and BEACON-S. BEACON-D and BEACON-S perform the computation within the enhanced CXL-DIMMs and enhanced CXL-Switches, respectively. Experimental results show that compared with state-of-the-art DIMM based NDP accelerators, on average, BEACON-D and BEACON-S improve the performance by 4. 70x and 4. 13x, respectively.
genome analysis,near-data-processing,software-hardware co-design,accelerator,memory dis-aggregation
AI 理解论文