CoPA: Cold Page Awakening to Overcome Retention Failures in STT-MRAM Based I/O Buffers

IEEE Transactions on Parallel and Distributed Systems(2022)

引用 4|浏览24
暂无评分
摘要
Performance and reliability are two prominent factors in the design of data storage systems. To achieve higher performance, recently storage system designers use $Dynamic$ $RAM$ (DRAM)-based buffers. The volatility of DRAM brings up the possibility of data loss and data inconsistency. Thus, a part of the main storage is conventionally used as the journal area to be able of recovering unflushed data pages in the case of power failure. Moreover, periodically flushing buffered data pages to the main storage is a common mechanism to preserve a high level of reliability. This scheme, however, leads to a considerable increase in storage write traffic, which adversely affects the performance. To address this shortcoming, recent studies offer a small $Non-Volatile$ $Memory$ (NVM) as the $Persistent$ $Journal$ $Area$ (PJA) along with DRAM as an efficient approach to overcome DRAM vulnerability against power failure while effectively reducing storage write traffic. This approach, named $NVM-Backed$ $Buffer$ (NVB-Buffer), features from advantages of NVMs and addresses DRAM shortcomings. In this article, we employ the most promising technologies for PJA among the emerging technologies, which is $Spin-Transfer$ $Torque$ $Magnetic$ $Random$ $Access$ $Memory$ (STT-MRAM) to meet the requirements of efficient PJA by providing high endurance, non-volatility, and DRAM-like latency. Despite these advantages, STT-MRAM faces major reliability challenges, i.e., Retention Failure , Read Disturbance , and Write Failure , which have not been addressed in previously suggested NVB-Buffers. In this article, we first demonstrate that the retention failure is the dominant source of errors in NVB-Buffers as it suffers from long and unpredictable page idle intervals (i.e., the time interval between two consecutive accesses to a PJA page). Then, we propose a novel NVB-Buffer management scheme, named, $\underline{Co}ld$ $\underline{P}age$ $\underline{A}wakening$ (CoPA), which predictably reduces the idle time of PJA pages. To this aim, CoPA employs $Distant$ $Refreshing$ to periodically overwrite the vulnerable PJA page contents by opportunistically using their replica in DRAM-based buffer. We compare CoPA with the state-of-the-art schemes over several well-known storage workloads based on physical journaling. Our evaluations show that CoPA significantly reduces the maximum page idle time, which leads to three orders of magnitude lower failure rate with negligible performance degradation (1.1%) and memory overhead (1.2%).
更多
查看译文
关键词
Data storage systems,persistent journal area,STT-MRAM,retention failure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要