Adaptive Memory Resource Management in a Data Center -A Transfer Learning Approach

Digital illustration of Cloud computing devices

Researcher: Steven Carr, PI

Sponsor: National Science Foundation, CSR: Small: Collaborative Research

Amount of Support: $112,000

Duration of Support: 5 years

Abstract: Cloud computing has become a dominant scalable computing platform for both online services and conventional data-intensive computing (examples include Amazon’s EC2, Microsoft’s Azure, IBM’s SmartCloud, etc.). Cloud computing data centers share computing resources among a large set of users, providing a cost effective means to allow users access to computational power and data storage not practical for an individual. A data center often has to over-commit its resources to meet Quality of Service contracts. The data center software needs to effectively manage its resources to meet the demands of users submitting a variety of applications, without any prior knowledge of these applications.

This work is focused on the issue of management of memory resources in a data center. Recent progress in transfer learning methods inspires this work in the creation of dynamic models to predict the cache and memory requirements of an application. The project has four main tasks: (i) an investigation into how recent advancements in transfer learning can help solve data center resource management problems, (ii) development of a dynamic cache predictor using on-the-fly virtual machine measurements, (iii) creation of a dynamic memory predictor using runtime characteristics of a virtual machine, and (iv) development of a unified resource management scheme creating a set of heuristics that dynamically adjust cache and memory allocation to fulfill Quality of Service goals. In tasks (i)-(iii), transfer learning methods are employed and explored to facilitate the transfer of knowledge and models to new system environments and applications based on extensive training on existing systems and benchmark applications. The prediction models and management scheme will be evaluated on common benchmarks including SPEC WEB and CloudSuite 2.0. The results of this research will have broad impact on the design and implementation of cloud computing data centers. The results will help improve resource utilization, boost system throughput, and improve predication performance in a cloud computing virtualization system. Additionally, the methods designed and knowledge they impart will advance understanding in both systems research and machine learning.

Link to additional info here.