Ph.D. student Zhiyuan Lu, Computer Science, to Present Dissertation Proposal on May 19

Ph.D. student Zhiyuan Lu Lu (Computer Sciencill present their Ph.D. Dissertation Proposal on Friday, May 19, 2023, from 10-11 a.m. via Zoom.

Join the Zoom meeting here.

Lu is advised by Jianhui Yue. His advisory committee comprises Soner Onder, Zhenlin Wang, and Qinghui Chen.

Proposal Title

“Optimizing Memory Management for Improved Efficiency in High-Performance Computing Systems”

Proposal Abstract

The emergence of new non-volatile memory (NVM) technology and deep neural network (DNN) inferences present challenges to off chip memory access. Ensuring crash consistency leads to additional memory operations and exposes memory update operations on the critical execution path. DNN inference execution on accelerators suffers from intensive off-chip memory access. This proposal seeks to address challenges to memory in these high-performance computing systems.

The logging operations, required by the crash consistency, lead to severe performance overhead due to additional memory access. To reduce the log request persistence time, we propose a load-aware log entry allocation (LALEA) scheme that allocates log requests to the address whose bank has the lightest workload. To address the intra-record ordering issue, we propose to buffer log metadata (BLOM) in a nonvolatile ADR buffer until its log can be removed. Moreover, the recently proposed LAD introduces unnecessary logging operations on multicore CPU. To reduce these unnecessary operations, we design two-stage transaction execution(TSTE) and virtual ADR buffers(VADR).

To address the issue of low response time and high computational intensity of DNN inferences, these computations are often executed on hardware accelerators in data centers. However, data loading from off-chip memory takes longer than computing, thereby reducing performance. This proposal aims to tackle this issue by reducing memory access latency. Specifically, the proposal seeks to improve the parallelism of access and decrease the number of off-chip memory accesses to reduce the loading time and improve overall system performance.