Category Archives: SAS

Soner Onder Receives Year One Funding for $1.2M NSF SCALE Project

Soner Onder
Dave Whalley

Soner Onder, professor of computer science, was recently awarded $246,329 for the first year of a four-year NSF grant for his project, “SHF: Medium: Collaborative Research: Statically Controlled Asynchronous Lane Execution (SCALE).” The project is in collaboration with Prof. David Whalley of Florida State University. Michigan Tech is the lead institution in the project, it is expected to total $1.2 million, with Michigan Tech receiving $600,000.

Abstract: Enabling better performing systems benefits applications that span those running on mobile devices to large data applications running on data centers. The efficiency of most applications is still primarily affected by single thread performance. Instruction-level parallelism (ILP) speeds up programs by executing instructions of the program in parallel, with ‘superscalar’ processors achieving maximum performance. At the same time, energy efficiency is a key criteria to keep in mind as such speedup happens, with these two being conflicting criteria in system design. This project develops a Statically Controlled Asynchronous Lane Execution (SCALE) approach that has the potential to meet or exceed the performance of a traditional superscalar processor while approaching the energy efficiency of a very long instruction word (VLIW) processor. As implied by its name, the SCALE approach has the ability to scale to different types and levels of parallelism. The toolset and designs developed in this project will be available as open-source and will also have an impact on both education and research. The SCALE architectural and compiler techniques will be included in undergraduate and graduate curricula.

The SCALE approach supports separate asynchronous execution lanes where dependencies between instructions in different lanes are statically identified by the compiler to provide inter-lane synchronization. Providing distinct lanes of instructions allows the compiler to generate code for different modes of execution to adapt to the type of parallelism that is available at each point within an application. These execution modes include explicit packaging of parallel instructions, parallel and pipelined execution of loop iterations, single program multiple data (SPMD) execution, and independent multi-threading.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.

https://www.nsf.gov/awardsearch/showAward?AWD_ID=1901005&HistoricalAwards=false


Soner Onder Presents Keynote at SAMOS XIX

Soner Onder
Soner Onder

Soner Onder (SAS), professor of computer science, presented a keynote lecture July 8, 2019, at the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XIX) on Samos Island, Greece, which was held July 7-11. Onder’s talk was titled, “Form Follows Function: The Case for Homogeneous Computer Architectures.” Onder also participated in the conference’s “The Annual Open Mike Panel.”

Keynote Lecture Abstract: ”Form follows function” is a principle associated with 20th-century modernist architecture and industrial design which says that the shape of a building or object should primarily relate to its intended function or purpose”[2]. For best performance in computer architecture, form must follow function as well. What are form and function in computer architecture? Form is easy to understand and interpret in its dictionary meaning; Function is not so clear-cut. In this talk, I will start with a simple problem, an algorithm, and a basic program representation that will be interpreted by the machine, and show that delivering high performance rests on solving only a handful, but fundamentally difficult problems. I will argue that the mere existence of domain specific solutions that general purpose computing cannot match in performance is a testament that the general purpose computing is ”not general enough”. What makes an architecture ”not general enough” is not the architecture itself, but rather the mismatch between the function its form had followed and the actual semantics of programs. To illustrate the point, I will challenge the widely understood interpretation of instruction-level parallelism (ILP) as ”single-thread performance”, and show that this interpretation is too short-sighted. We can efficiently exploit all types of available parallelism, including process-level, thread-level and data level parallelism, all at the instruction-level, and this approach is both feasible and necessary to combat the complexity that is plaguing our profession. I will then discuss why an executable single-assignment program representation [1] may be the ultimate function whose implementations may result in homogeneous general purpose architectures that can potentially match the performance of accelerators for specific tasks, while exceeding the performance of any accelerator traditional architecture combination for general tasks. I will conclude by discussing our results with Demand-driven Execution (DDE), whose form follows this single-assignment program representation.

About SAMOS (from http://samos-conference.com/): SAMOS is a unique conference. It deals with embedded systems (sort of) but that is not what makes it different. It brings together every year researchers from both academia and industry on the quiet and inspiring northern mountainside of the Mediterranean island of Samos, which in itself is different. But more importantly, it really fosters collaboration rather than competition. Formal and intensive technical sessions are only held in the mornings.A lively panel or distinguished keynote speaker ends the formal part of the day, and leads nicely into the afternoons and evenings — reserved for informal discussions, good food, and the inviting Aegean Sea. The conference papers will be published by Springer’s Lecture Notes in Computer Science – LNCS and will be included in the DBLP Database.

Samos Island, Greece
Samos Island, Greece

Soner Onder Presents Talk in Barcelona, Spain

Soner Onder is pictured on the right in the front.

Sonder Onder (SAS), professor of computer science, presented an invited talk at “Yale:80: Pushing the Envelope of Computing for the Future,” held July 1-2, 2019, in Barcelona, Spain. The workshop was organized by Universitat Politècnica de Catalunya in honor of the 80th birthday of Yale Patt, a prominent computer architecture researcher. Onder was one of 23 invitees to give a talk. His lecture was titled, “Program semantics meets architecture: What if we did not have branches?”

View the slides from Onder’s talk: Yale80-in-2019-Soner-Onder

Yale Patt is a professor in the Department of Electrical & Computer Engineering at The University of Texas at Austin, where he holds the Ernest Cockrell, Jr. Centennial Chair in Engineering. He also holds the title of University Distinguished Teaching Professor. Patt was elected to the National Academy of Engineering in 2014, among the highest professional distinctions bestowed upon an engineer. View Patt’s faculty webpage at: http://www.ece.utexas.edu/people/faculty/yale-patt.

Link to the workshop’s website here: http://research.ac.upc.edu/80-in-2019/

Visit the workshop’s Facebook page here: https://www.facebook.com/BSCCNS/posts/workshop-yale-80-in-2019pushing-the-envelope-of-computing-for-the-futurehttprese/2217508564992996/

Soner Onder, Barcelona, Spain
Soner Onder at Sagrada Família, Barcelona, Spain

Ali Ebnenasir is Co-author of Article in ACM Transactions on Computational Logic

Ali EbnenasirAli Ebnenasir (SAS/CS), professor of computer science, is co-author of the article, “On the verification of livelock-freedom and self-stabilization on parameterized rings,” published in the July 2019 issue of the journal ACM Transactions on Computational Logic. The article is co-authored by Alex Klinkhamer of Google.

Abstract: This article investigates the verification of livelock-freedom and self-stabilization on parameterized rings consisting of symmetric, constant space, deterministic, and self-disabling processes. The results of this article have a significant impact on several fields, including scalable distributed systems, resilient and self-* systems, and verification of parameterized systems. First, we identify necessary and sufficient local conditions for the existence of global livelocks in parameterized unidirectional rings with unbounded (but finite) number of processes under the interleaving semantics. Using a reduction from the periodic domino problem, we show that, in general, verifying livelock-freedom of parameterized unidirectional rings is undecidable (specifically, Π10-complete) even for constant space, deterministic, and self-disabling processes. This result implies that verifying self-stabilization for parameterized rings of self-disabling processes is also undecidable. We also show that verifying livelock-freedom and self-stabilization remain undecidable under (1) synchronous execution semantics, (2) the FIFO consistency model, and (3) any scheduling policy. We then present a new scope-based method for detecting and constructing livelocks in parameterized rings. The proposed semi-algorithm behind our scope-based verification is based on a novel paradigm for the detection of livelocks that totally circumvents state space exploration. Our experimental results on an implementation of the proposed semi-algorithm are very promising as we have found livelocks in parameterized rings in a few microseconds on a regular laptop. The results of this article have significant implications for scalable distributed systems with cyclic topologies.

Citation: Klinkhamer, A., & Ebnenasir, A. (2019). On the verification of livelock-freedom and self-stabilization on parameterized rings. ACM Transactions on Computational Logic, 20(3), 16:1-16:36. http://dx.doi.org/10.1145/3326456

MTU Digital Commons link: https://digitalcommons.mtu.edu/michigantech-p/146/

ACM link: https://dl.acm.org/citation.cfm?doid=3338853.3326456


Saeid Nooshabadi is Co-author of Article in Journal of Parallel and Distributed Computing

Saeid Nooshabadi

Saeid Nooshabadi (SAS/ECE), professor of electrical and computer engineering, is co-author of the article, “High-dimensional image descriptor matching using highly parallel KD-tree construction and approximate nearest neighbor search,” to be published in  the October 2019 issue of the Journal of Parallel and Distributed Computing, which is published by Elsevier. The article is co-authored by Michigan Tech Computer Science department doctoral candidate Linjia Hu.

Abstract: To overcome the high computational cost associated with the high-dimensional digital image descriptor matching, this paper presents a set of integrated parallel algorithms for the construction of K-dimensional tree (KD-tree) and P approximate nearest neighbor search (P-ANNS) on the modern massively parallel architectures (MPA). To improve the runtime performance of the P-ANNS, we propose an efficient sliding window for a parallel buffered P-ANNS on KD-tree to mitigate the high cost of global memory accesses. When applied to high dimensional real-world image descriptor datasets, the proposed KD-tree construction and the buffered P-ANNS algorithms are of comparable matching quality to the traditional sequential counterparts on CPU, while outperforming their serial CPU counterparts by speedup factors of up to 17 and 163, respectively. The algorithms are also studied for the performance impact factors to obtain the optimal runtime configurations for various datasets. Moreover, we verify the features of the parallel algorithms on typical 3D image matching scenarios. With the classical local image descriptor signature of histograms of orientations (SHOT) datasets, the parallel KD-tree construction and image descriptor matching can achieve up to 11 and 138-fold speedups, respectively.

Citation: Hu, L., & Nooshabadi, S. (2019). High-dimensional image descriptor matching using highly parallel KD-tree construction and approximate nearest neighbor search. Journal of Parallel and Distributed Computing, 132, 127-140. http://dx.doi.org/10.1016/j.jpdc.2019.06.003

MTU Digital Commons link: https://digitalcommons.mtu.edu/michigantech-p/145/

Elsevier link: https://www.sciencedirect.com/science/article/pii/S0743731519304319?via%3Dihub


Zhenlin Wang is Co-Author of Article in Parallel Programming Journal

Zhenlin Wang (SAS), professor of computer science, is co-author of the article, “Lightweight and accurate memory allocation in key-value cache,” published in the June 2019 issue of the International Journal of Parallel Programming, which is published by Springer.

Abstract: The use of key-value caches in modern web servers is becoming more and more ubiquitous. Representatively, Memcached as a widely used key-value cache system, originally intended for speeding up dynamic web applications by alleviating database load. One of the key factors affecting the performance of Memcached is the memory allocation among different item classes. How to obtain the most efficient partitioning scheme with low time and space consumption is a focus of attention. In this paper, we propose a lightweight and accurate memory allocation scheme in Memcached, by sampling access patterns, analyzing data locality, and reassigning the memory space. One early study on optimizing memory allocation is LAMA, which uses footprint-based MRC to optimize memory allocation in Memcached. However, LAMA does not model deletion operations in Memcached and its spatial overhead is quite large. We propose a method that consumes only 3% of LAMA space and can handle read, write and deletion operations. Moreover, evaluation results show that the average stable-state miss ratio is reduced by 15.0% and the average stable-state response time is reduced by 12.3% when comparing our method to LAMA.

Citation: Pan, C., Zhou, L., Luo, Y., Wang, X., & Wang, Z. (2019). Lightweight and accurate memory allocation in key-value cache. International Journal of Parallel Programming, 47(3), 451-466.http://dx.doi.org/10.1007/s10766-018-0616-4

Digital Commons link: https://digitalcommons.mtu.edu/michigantech-p/144/

Springer link: https://link.springer.com/article/10.1007%2Fs10766-018-0616-4


Ali Ebnenasir is Co-Author of Publication in ACM Transactions on Computational Logic

Ali Ebnenasir

An article co-authored by Ali Ebnenasir (SAS/CS) and Alex Klinkhamer, “Verification of Livelock-Freedom and Self-Stabilization on Parameterized Rings,” was recently published in ACM Transactions on Computational Logic.

Abstract: This article investigates the verification of livelock-freedom and self-stabilization on parameterized rings consisting of symmetric, constant space, deterministic, and self-disabling processes. The results of this article have a significant impact on several fields, including scalable distributed systems, resilient and self-* systems, and verification of parameterized systems. First, we identify necessary and sufficient local conditions for the existence of global livelocks in parameterized unidirectional rings with unbounded (but finite) number of processes under the interleaving semantics. Using a reduction from the periodic domino problem, we show that, in general, verifying livelock-freedom of parameterized unidirectional rings is undecidable (specifically, Π10-complete) even for constant space, deterministic, and self-disabling processes. This result implies that verifying self-stabilization for parameterized rings of self-disabling processes is also undecidable. We also show that verifying livelock-freedom and self-stabilization remain undecidable under (1) synchronous execution semantics, (2) the FIFO consistency model, and (3) any scheduling policy. We then present a new scope-based method for detecting and constructing livelocks in parameterized rings. The proposed semi-algorithm behind our scope-based verification is based on a novel paradigm for the detection of livelocks that totally circumvents state space exploration. Our experimental results on an implementation of the proposed semi-algorithm are very promising as we have found livelocks in parameterized rings in a few microseconds on a regular laptop. The results of this article have significant implications for scalable distributed systems with cyclic topologies.

https://dl.acm.org/citation.cfm?id=3326456&dl=ACM&coll=DL

doi: 10.1145/3326456


Scalable Spectral Sparsification of Graph Laplacians and Integrated Circuits

Circuit board

Researcher: Zhuo Feng, Associate Professor, Electrical and Computer Engineering

Sponsor: National Science Foundation: SHF: Small

Amount of Support: $450,000

Duration of Support: 3 years

Abstract: This research is motivated by investigations on scalable methods for design simplifications of nanoscale integrated circuits (ICs). This is to be achieved by extending the associated spectral graph sparsification framework to handle Laplacian-like matrices derived from general nonlinear IC modeling and simulation problems. The results from this research may prove to be key to the development of highly scalable computer-aided design algorithms for modeling, simulation, design, optimization, as well as verification of future nanoscale ICs that can easily involve multi-billions of circuit components. The algorithms and methodologies developed will be disseminated to leading technology companies that may include semiconductor and Electronic Design Automation companies as well as social and network companies, for potential industrial deployments.

Spectral graph sparsification aims to find an ultra-sparse subgraph (a.k.a. sparsifier) such that its Laplacian can well approximate the original one in terms of its eigenvalues and eigenvectors. Since spectrally similar subgraphs can approximately preserve the distances, much faster numerical and graph-based algorithms can be developed based on these “spectrally” sparsified networks. A nearly-linear complexity spectral graph sparsification algorithm is to be developed based on a spectral perturbation approach. The proposed method is highly scalable and thus can be immediately leveraged for the development of nearly-linear time sparse matrix solvers and spectral graph (data) partitioning (clustering) algorithms for large real-world graph problems in general. The results of the research may also influence a broad range of computer science and engineering problems related to complex system/network modeling, numerical linear algebra, optimization, machine learning, computational fluid dynamics, transportation and social networks, etc.

More details.


Improving Reliability of In-Memory Storage

Electronic circuit board

Researcher: Jianhui Yue, PI, Assistant Professor, Computer Science

Sponsor: National Science Foundation, SHF: Small: Collaborative Research

Amount of Support: $192, 716

Duration of Support: 3 years

Abstract: Emerging nonvolatile memory (NVM) technologies, such as PCM, STT-RAM, and memristors, provide not only byte-addressability, low-latency reads and writes comparable to DRAM, but also persistent writes and potentially large storage capacity like an SSD. These advantages make NVM likely to be next-generation fast persistent storage for massive data, referred to as in-memory storage. Yet, NVM-based storage has two challenges: (1) Memory cells have limited write endurance (i.e., the total number of program/erase cycles per cell); (2) NVM has to remain in a consistent state in the event of a system crash or power loss. The goal of this project is to develop an efficient in-memory storage framework that addresses these two challenges. This project will take a holistic approach, spanning from low-level architecture design to high-level OS management, to optimize the reliability, performance, and manageability of in-memory storage. The technical approach will involve understanding the implication and impact of the write endurance issue when cutting-edge NVM is adopted into storage systems. The improved understanding will motivate and aid the design of cost-effective methods to improve the life-time of in-memory storage and to achieve efficient and reliable consistence maintenance.

Publications:

Pai Chen, Jianhui Yue, Xiaofei Liao, Hai Jin. “Optimizing DRAM Cache by a Trade-off between Hit Rate and Hit Latency,” IEEE Transactions on Emerging Topics in Computing, 2018. doi:10.1109/TETC.2018.2800721

Chenlei Tang, Jiguang Wan, Yifeng Zhu, Zhiyuan Liu, Peng Xu, Fei Wu and Changsheng Xie. “RAFS: A RAID-Aware File System to Reduce Parity Update Overhead for SSD RAID,” Design Automation Test In Europe Conference (DATE) 2019, 2019.

Pai Chen, Jianhui Yue, Xiaofei Liao, Hai Jin. “Trade-off between Hit Rate and Hit Latency for Optimizing DRAM Cache,” IEEE Transactions on Emerging Topics in Computing, 2018.

More details


Zhou Feng is PI on $500K NSF Project

Zhuo Feng (ECE/ICC) is Principal Investigator on a project that has received a $500,000 research and development grant from the National Science Foundation. This potential three-year project is titled, “SHF: Small: Spectral Reduction of Large Graphs and Circuit Networks.”

This research project will investigate a truly-scalable yet unified spectral graph reduction approach that allows reducing large-scale, real-world directed and undirected graphs with guaranteed preservation of the original graph spectra. Unlike prior methods that are only suitable for handling specific types of graphs (e.g. undirected or strongly-connected graphs), this project uses a more universal approach and thus will allow for spectral reduction of a much wider range of real-world graphs that may involve billions of elements:

  • spectrally-reduced social (data) networks allow for more efficiently modeling, mining and analysis of large social (data) networks;
  • spectrally-reduced neural networks allow for more scalable model training and processing in emerging machine learning tasks;
  • spectrally-reduced web-graphs allow for much faster computations of personalized PageRank vectors;
  • spectrally-reduced integrated circuit networks will lead to more efficient partitioning, modeling, simulation, optimization and verification of large chip designs, etc.

From Tech Today, June 21, 2019