Tag Archives: OCR

Ubiquitous High-Performance Computing (UHPC) and X-Stack Projects

The Ubiquituous High-Performance Computing Project, funded by the Defense Advanced Research Projects Agency (DARPA), initiates research on energy-efficient, resilient, and many-core computing on the horizon for 2018. Faced with the end of Dennard scaling, it was imperative to provide better hardware and software to face energy consumption of future computers, but also to exploit a large number of cores in a single cabinet (up to 1015 floating-point operations per second), all the while consuming no more than 50kW. A thousand of those machines have the potential to reach one exaflop (1015 floating-point operations per second). The hardware should expose several “knobs” to the software, to allow applications to gracefully adapt to a very dynamic environment, and expand and/or contract parallelism depending on various constraints such as maximal authorized power envelope, desired energy-efficiency, and required minimal performance.

Following UHPC, the Department of Energy-funded X-Stack Software Research project recentered the objectives. By using traditional high-performance communication libraries such as the Message-Passing Interface (MPI), by revolutionizing both hardware and software at the compute-node level.

In both cases, it was deemed unlikely that traditional programming and execution models would be able to deal with novel hardware. Taking advantage of the parallelism offered by the target straw-man hardware platform would be impossible without new system software components.

The Codelet Model was then implemented in various runtime systems, and inspired the Intel-led X-Stack project to define the Open Community Runtime (OCR). The Codelet Model was used on various architectures, from the IBM Cyclops-64 general-purpose many-core processor, to regular x86 compute nodes, as well as the Intel straw-man architecture, Traleika Glacier. Depending on the implementations, codelet-based runtime systems run on shared-memory or distributed systems. They showed their potential on both classical scientific workloads based on linear algebra, and more recent (and irregular) ones such as graph-related parallel breadth-first search. To achieve good results, hierarchical parallelism and specific task-scheduling strategies were needed.

Self-awareness is a combination of introspection and adaptation mechanisms. Introspection is used to determine the health of the system, while adaptation changes parameters of the system so parts of the compute node consume less energy, shutdown processing units, etc. Introspection and adaptation are driven by high-level goals expressed by the user, related to power and energy consumption, performance, and resilience.

The team studied how to perform fine-grain resource management to achieve self-awareness using codelets, and built a self-aware simulation tool to evaluate the benefits of various adaptive strategies.

45

The TERAFLUX Project

The TERAFLUX project was funded by the European Union. It targeted so-called “teradevices,” devices featuring more than 1,000 cores on a single chip, but with an architecture that will make it near-impossible to exploit using traditional programming and execution models. DF-Threads, a novel execution model based on dataflow principles was proposed to exploit such devices. A simulation infrastructure was used to demonstrate the potential of such a solution, while remaining programmable. At the same time, it was important to maintain a certain level of compatibility with existing systems and features expected by application programmers.

Both models borrow from dataflow models of computation, but they each feature subtle differences requiring special care to bridge them. Stéphane Zuckerman and his colleagues ported DARTS—their implementation of the Codelet Model—to the TERAFLUX simulator, and showed a convergence path existed between DF-Thread and codelet-execution models. The research demonstrated the advantages of hardware-based, software-controlled multithreading with hardware scheduling units for scalability and performance.

Stéphane Zuckerman presented the results and outcomes of his research in peer-reviewed conferences and workshops.