Category: SAS

Junqiao Qiu to Present Lecture November 6

Assistant Professor Junqiao Qiu, Computer Science, will present his lecture, “Speculative Parallelization for FSM-centric Computations,” this Friday, Nov. 6, 2020, at 3:00 p.m., via online meeting.

Lecture Abstract

As a fundamental computation model, finite-state machine (FSM) has been used in a wide range of data-intensive applications, including malware detection, bioinformatics, semi-structured data analytics, natural language processing and even machine learning. However, FSM execution is known to be “embarrassingly sequential” due to the state dependences among transitions. Current studies find that speculation is a promising solution to address the inherent dependencies in FSM computations and thus enables scalable FSM parallelization.
This talk will firstly introduce the fundamental scalability bottleneck in the current FSM parallelization, and then an aggressive speculation, a generalized speculation model that allows a speculated state to be validated against the result from another speculation, is proposed to address the scalability limitations. Finally, this talk will discuss the possibility to enlarge the applicability of the proposed approach and go beyond the FSM-based computations.

Juneiao Qiu is a member of the Institute of Computing and Cybersystems’ (ICC) Center for Scalable Architectures and Systems (SAS).

Hongyu An: Curious About the World and Exploring the Unknown

by Karen S. Johnson, Communications Director, ICC

“A scientist should be a person who is always curious about nature and the world, and who tries to explore the unknown.” –Hongyu An, Assistant Professor, Electrical and Computer Engineering

Hongyu An, Assistant Professor, ECE

Exploring science and technology is always exciting for new Assistant Professor Hongyu An, Electrical and Computer Engineering. He says he is “very pleased to have the chance to mentor the next generation and share my knowledge and experience with undergraduate and graduate students.”

Several things drew Hongyu An to Michigan Tech, including his observation that as an institution Michigan Tech cares about its employees. “The excellent professors, smart students, and the supportive environment are the main reasons I joined Michigan Tech,” he says. “As a new faculty member, I am facing a lot of new challenges. There is great support in my department (ECE) and through the ICC.”

Hongyu is a member of two Institute of Computing and Cybersystems (ICC) research centers: Human-Centered Computing and Scalable Architectures and Systems. He also sees synergies with the Center for Cyber-Physical Systems.

“It is my great pleasure and honor to be a member of the ICC,” Hongyu says. “ I can collaborate with the experts in HCC for exploring the brain and artificial intelligence, and the professors in SAS for hardware and architecture designs. Moreover, the neuromorphic chips I am working on can potentially be applied to Cyber-Physical Systems.”

Hongyu’s primary research area is hardware design for AI and neuromorphic systems. He believes that Artificial Intelligence is probably one of the most challenging research topics in science, noting that recent work in deep learning and artificial neural networks is demonstrating great progress in approaching artificial intelligence. 

“But the traditional computers under von Neumann architecture cannot keep up with the development of neural networks and deep learning,” he cautions. “My research is addressing this challenge by using a new hardware design, from device to architecture levels.”

Hongyu’s teaching interests include VLSI, Circuits, and Electromagnetics. Desribing his teaching philosophy, he notes that making complicated things simple is more challenging than making simple things complicated, and that he strives for the former. This academic year, An is teaching EE 4271 VLSI Design and mentoring ECE master’s student, Sarvani Marthi Sarvani, whose project aims to design a silicon retina through CMOS and Memristors.

Hongyu and his research team are also investigating associative memory learning, a new learning method that aims to create a neuromorphic system that can learn from its surroundings directly. 

“Associative memory is a widespread self-learning method in biological livings, which enables the nervoussystem to remember the relationship between two concurrent events,” Hongyu explains. “Through this learning method, dogs can learn the sound of bells as a sign of food; people can remember a word representing an object.”

“The significance of rebuilding associative memory at a behavioral level not only reveals a way of designing a brain-like, self-learning neuromorphic system, it is also to explore a method of comprehending the learning mechanism of a nervous system,” he adds.

And finally, beyond his work as a professor and scientist Hongyu hopes that he is “a good husband to my wife, a good father to my sons, and a good son to my parents.”

Hongyu completed his Ph.D. in electrical engineering at Virginia Tech, his M.S. in electrical engineering at Missouri University of Science and Technology, and his B.S. in electrical engineering at Shenyang University of Technology.

Recent Publications

An, Hongyu, Mohammad Shah Al-Mamun, Marius K. Orlowski, Lingjia Liu, and Yang Yi. “Robust Deep Reservoir Computing through Reliable Memristor with Improved Heat Dissipation Capability. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020).

An, Hongyu, Qiyuan An, and Yang Yi. “Realizing Behavior Level Associative Memory Learning Through Three-Dimensional Memristor-Based Neuromorphic Circuits. IEEE Transactions on Emerging Topics in Computational Intelligence (2019).

Founded in 2015, the Institute of Computing and Cybersystems (ICC) promotes collaborative, cross-disciplinary research and learning experiences in the areas of computing education, cyber-physical systems, cybersecurity, data sciences, human-centered computing, and scalable architectures and systems, for the benefit of Michigan Technological University and society at large.

The ICC creates and supports an arena in which faculty and students work collaboratively across organizational boundaries in an environment that mirrors contemporary technological innovation. The ICC’s 55 members represent more than 20 academic disciplines at Michigan Tech.

Soner Onder and Dave Whalley Investigate Instruction-level Parallelism

From Florida State University News

A Florida State University researcher is working to make computer processors execute applications in a more energy-efficient manner with the help of a new $1.2 million grant from the National Science Foundation.

Professor Dave Whalley, Florida State University

“The general goal is to increase performance but to do it in a manner that is more energy efficient than the dominant computer processors that are in use today,” Professor of Computer Science David Whalley said.

To do that, Whalley and his colleague Soner Onder, a professor at Michigan Technological University, hope to more efficiently exploit what’s called instruction-level parallelism, or the ability of a computer to simultaneously execute multiple machine instructions.

Professor Soner Onder, Michigan Tech Department of Computer Science
Professor Soner Onder, Michigan Tech Department of Computer Science

“In general, VLIW processors are more energy efficient but cannot approach the performance of OoO processors except in limited domains, such as digital signal processing,” Whalley said.

Whalley’s project, called SCALE for Statically Controlled Asynchronous Lane Execution, is designed to overcome these current limitations. SCALE supports separate execution lanes, so that instructions in separate lanes can execute in parallel and dependencies between instructions in different lanes are identified by the compiler to synchronize these lanes when necessary.

“Providing distinct lanes of instructions allows the compiler to generate code for different modes of execution to adapt to the type of parallelism that is available at each point within an application,” Whalley said.

The grant began this fall and will run through August 2023. Half of the funding will come to Florida State, with the other half supporting Onder’s part of the work at Michigan Technological University. The FSU portion will support two graduate students in computer science.

Soner Onder Receives Year One Funding for $1.2M NSF SCALE Project

Soner Onder

Dave Whalley

Soner Onder, professor of computer science, was recently awarded $246,329 for the first year of a four-year NSF grant for his project, “SHF: Medium: Collaborative Research: Statically Controlled Asynchronous Lane Execution (SCALE).” The project is in collaboration with Prof. David Whalley of Florida State University. Michigan Tech is the lead institution in the project, it is expected to total $1.2 million, with Michigan Tech receiving $600,000.

Abstract: Enabling better performing systems benefits applications that span those running on mobile devices to large data applications running on data centers. The efficiency of most applications is still primarily affected by single thread performance. Instruction-level parallelism (ILP) speeds up programs by executing instructions of the program in parallel, with ‘superscalar’ processors achieving maximum performance. At the same time, energy efficiency is a key criteria to keep in mind as such speedup happens, with these two being conflicting criteria in system design. This project develops a Statically Controlled Asynchronous Lane Execution (SCALE) approach that has the potential to meet or exceed the performance of a traditional superscalar processor while approaching the energy efficiency of a very long instruction word (VLIW) processor. As implied by its name, the SCALE approach has the ability to scale to different types and levels of parallelism. The toolset and designs developed in this project will be available as open-source and will also have an impact on both education and research. The SCALE architectural and compiler techniques will be included in undergraduate and graduate curricula.

The SCALE approach supports separate asynchronous execution lanes where dependencies between instructions in different lanes are statically identified by the compiler to provide inter-lane synchronization. Providing distinct lanes of instructions allows the compiler to generate code for different modes of execution to adapt to the type of parallelism that is available at each point within an application. These execution modes include explicit packaging of parallel instructions, parallel and pipelined execution of loop iterations, single program multiple data (SPMD) execution, and independent multi-threading.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.

https://www.nsf.gov/awardsearch/showAward?AWD_ID=1901005&HistoricalAwards=false

Soner Onder Presents Keynote at SAMOS XIX

Soner Onder
Soner Onder

Soner Onder (SAS), professor of computer science, presented a keynote lecture July 8, 2019, at the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XIX) on Samos Island, Greece, which was held July 7-11. Onder’s talk was titled, “Form Follows Function: The Case for Homogeneous Computer Architectures.” Onder also participated in the conference’s “The Annual Open Mike Panel.”

Keynote Lecture Abstract: ”Form follows function” is a principle associated with 20th-century modernist architecture and industrial design which says that the shape of a building or object should primarily relate to its intended function or purpose”[2]. For best performance in computer architecture, form must follow function as well. What are form and function in computer architecture? Form is easy to understand and interpret in its dictionary meaning; Function is not so clear-cut. In this talk, I will start with a simple problem, an algorithm, and a basic program representation that will be interpreted by the machine, and show that delivering high performance rests on solving only a handful, but fundamentally difficult problems. I will argue that the mere existence of domain specific solutions that general purpose computing cannot match in performance is a testament that the general purpose computing is ”not general enough”. What makes an architecture ”not general enough” is not the architecture itself, but rather the mismatch between the function its form had followed and the actual semantics of programs. To illustrate the point, I will challenge the widely understood interpretation of instruction-level parallelism (ILP) as ”single-thread performance”, and show that this interpretation is too short-sighted. We can efficiently exploit all types of available parallelism, including process-level, thread-level and data level parallelism, all at the instruction-level, and this approach is both feasible and necessary to combat the complexity that is plaguing our profession. I will then discuss why an executable single-assignment program representation [1] may be the ultimate function whose implementations may result in homogeneous general purpose architectures that can potentially match the performance of accelerators for specific tasks, while exceeding the performance of any accelerator traditional architecture combination for general tasks. I will conclude by discussing our results with Demand-driven Execution (DDE), whose form follows this single-assignment program representation.

About SAMOS (from http://samos-conference.com/): SAMOS is a unique conference. It deals with embedded systems (sort of) but that is not what makes it different. It brings together every year researchers from both academia and industry on the quiet and inspiring northern mountainside of the Mediterranean island of Samos, which in itself is different. But more importantly, it really fosters collaboration rather than competition. Formal and intensive technical sessions are only held in the mornings.A lively panel or distinguished keynote speaker ends the formal part of the day, and leads nicely into the afternoons and evenings — reserved for informal discussions, good food, and the inviting Aegean Sea. The conference papers will be published by Springer’s Lecture Notes in Computer Science – LNCS and will be included in the DBLP Database.

Samos Island, Greece

Samos Island, Greece

Soner Onder Presents Talk in Barcelona, Spain

Soner Onder is pictured on the right in the front.

Sonder Onder (SAS), professor of computer science, presented an invited talk at “Yale:80: Pushing the Envelope of Computing for the Future,” held July 1-2, 2019, in Barcelona, Spain. The workshop was organized by Universitat Politècnica de Catalunya in honor of the 80th birthday of Yale Patt, a prominent computer architecture researcher. Onder was one of 23 invitees to give a talk. His lecture was titled, “Program semantics meets architecture: What if we did not have branches?”

View the slides from Onder’s talk: Yale80-in-2019-Soner-Onder

Yale Patt is a professor in the Department of Electrical & Computer Engineering at The University of Texas at Austin, where he holds the Ernest Cockrell, Jr. Centennial Chair in Engineering. He also holds the title of University Distinguished Teaching Professor. Patt was elected to the National Academy of Engineering in 2014, among the highest professional distinctions bestowed upon an engineer. View Patt’s faculty webpage at: http://www.ece.utexas.edu/people/faculty/yale-patt.

Link to the workshop’s website here: http://research.ac.upc.edu/80-in-2019/

Visit the workshop’s Facebook page here: https://www.facebook.com/BSCCNS/posts/workshop-yale-80-in-2019pushing-the-envelope-of-computing-for-the-futurehttprese/2217508564992996/

Soner Onder, Barcelona, Spain

Soner Onder at Sagrada Família, Barcelona, Spain

Ali Ebnenasir is Co-author of Article in ACM Transactions on Computational Logic

Ali EbnenasirAli Ebnenasir (SAS/CS), professor of computer science, is co-author of the article, “On the verification of livelock-freedom and self-stabilization on parameterized rings,” published in the July 2019 issue of the journal ACM Transactions on Computational Logic. The article is co-authored by Alex Klinkhamer of Google.

Abstract: This article investigates the verification of livelock-freedom and self-stabilization on parameterized rings consisting of symmetric, constant space, deterministic, and self-disabling processes. The results of this article have a significant impact on several fields, including scalable distributed systems, resilient and self-* systems, and verification of parameterized systems. First, we identify necessary and sufficient local conditions for the existence of global livelocks in parameterized unidirectional rings with unbounded (but finite) number of processes under the interleaving semantics. Using a reduction from the periodic domino problem, we show that, in general, verifying livelock-freedom of parameterized unidirectional rings is undecidable (specifically, Π10-complete) even for constant space, deterministic, and self-disabling processes. This result implies that verifying self-stabilization for parameterized rings of self-disabling processes is also undecidable. We also show that verifying livelock-freedom and self-stabilization remain undecidable under (1) synchronous execution semantics, (2) the FIFO consistency model, and (3) any scheduling policy. We then present a new scope-based method for detecting and constructing livelocks in parameterized rings. The proposed semi-algorithm behind our scope-based verification is based on a novel paradigm for the detection of livelocks that totally circumvents state space exploration. Our experimental results on an implementation of the proposed semi-algorithm are very promising as we have found livelocks in parameterized rings in a few microseconds on a regular laptop. The results of this article have significant implications for scalable distributed systems with cyclic topologies.

Citation: Klinkhamer, A., & Ebnenasir, A. (2019). On the verification of livelock-freedom and self-stabilization on parameterized rings. ACM Transactions on Computational Logic, 20(3), 16:1-16:36. http://dx.doi.org/10.1145/3326456

MTU Digital Commons link: https://digitalcommons.mtu.edu/michigantech-p/146/

ACM link: https://dl.acm.org/citation.cfm?doid=3338853.3326456

Saeid Nooshabadi is Co-author of Article in Journal of Parallel and Distributed Computing

Saeid Nooshabadi

Saeid Nooshabadi (SAS/ECE), professor of electrical and computer engineering, is co-author of the article, “High-dimensional image descriptor matching using highly parallel KD-tree construction and approximate nearest neighbor search,” to be published in  the October 2019 issue of the Journal of Parallel and Distributed Computing, which is published by Elsevier. The article is co-authored by Michigan Tech Computer Science department doctoral candidate Linjia Hu.

Abstract: To overcome the high computational cost associated with the high-dimensional digital image descriptor matching, this paper presents a set of integrated parallel algorithms for the construction of K-dimensional tree (KD-tree) and P approximate nearest neighbor search (P-ANNS) on the modern massively parallel architectures (MPA). To improve the runtime performance of the P-ANNS, we propose an efficient sliding window for a parallel buffered P-ANNS on KD-tree to mitigate the high cost of global memory accesses. When applied to high dimensional real-world image descriptor datasets, the proposed KD-tree construction and the buffered P-ANNS algorithms are of comparable matching quality to the traditional sequential counterparts on CPU, while outperforming their serial CPU counterparts by speedup factors of up to 17 and 163, respectively. The algorithms are also studied for the performance impact factors to obtain the optimal runtime configurations for various datasets. Moreover, we verify the features of the parallel algorithms on typical 3D image matching scenarios. With the classical local image descriptor signature of histograms of orientations (SHOT) datasets, the parallel KD-tree construction and image descriptor matching can achieve up to 11 and 138-fold speedups, respectively.

Citation: Hu, L., & Nooshabadi, S. (2019). High-dimensional image descriptor matching using highly parallel KD-tree construction and approximate nearest neighbor search. Journal of Parallel and Distributed Computing, 132, 127-140. http://dx.doi.org/10.1016/j.jpdc.2019.06.003

MTU Digital Commons link: https://digitalcommons.mtu.edu/michigantech-p/145/

Elsevier link: https://www.sciencedirect.com/science/article/pii/S0743731519304319?via%3Dihub

Zhenlin Wang is Co-Author of Article in Parallel Programming Journal

Zhenlin Wang (SAS), professor of computer science, is co-author of the article, “Lightweight and accurate memory allocation in key-value cache,” published in the June 2019 issue of the International Journal of Parallel Programming, which is published by Springer.

Abstract: The use of key-value caches in modern web servers is becoming more and more ubiquitous. Representatively, Memcached as a widely used key-value cache system, originally intended for speeding up dynamic web applications by alleviating database load. One of the key factors affecting the performance of Memcached is the memory allocation among different item classes. How to obtain the most efficient partitioning scheme with low time and space consumption is a focus of attention. In this paper, we propose a lightweight and accurate memory allocation scheme in Memcached, by sampling access patterns, analyzing data locality, and reassigning the memory space. One early study on optimizing memory allocation is LAMA, which uses footprint-based MRC to optimize memory allocation in Memcached. However, LAMA does not model deletion operations in Memcached and its spatial overhead is quite large. We propose a method that consumes only 3% of LAMA space and can handle read, write and deletion operations. Moreover, evaluation results show that the average stable-state miss ratio is reduced by 15.0% and the average stable-state response time is reduced by 12.3% when comparing our method to LAMA.

Citation: Pan, C., Zhou, L., Luo, Y., Wang, X., & Wang, Z. (2019). Lightweight and accurate memory allocation in key-value cache. International Journal of Parallel Programming, 47(3), 451-466.http://dx.doi.org/10.1007/s10766-018-0616-4

Digital Commons link: https://digitalcommons.mtu.edu/michigantech-p/144/

Springer link: https://link.springer.com/article/10.1007%2Fs10766-018-0616-4