Tino Moore, a Department of Computer Science graduate student, will present his RQE lecture on Friday, April 9, 2021, at 2:00 p.m.
Poor Man’s Trace Cache: Static Trace Construction via Instruction Replication
We introduce a novel variable-length branch delay slot architecture called Poor Man’s Trace Cache (PMTC). PMTC constructs instruction traces in static code via instruction replication into variable size delay slots at assembly time. All unconditional direct and select conditional direct control transfer instructions are provided a variable size delay slot. Delay slots, and the traces they contain, extend to the next cache line boundary, ensuring traces are fetched along with the control transfer instruction that initiated the trace. Branch, jump and return instruction semantics as well as fetch unit architecture are modified to utilize traces in delay slots when unused fetch slots are available. PMTC yields the following benefits: (1) Average fetch bandwidth increases as the front end can fetch across taken control transfer instructions in a single cycle. (2) The dynamic number of instruction cache lines fetched by the processor is reduced as multiple non contiguous basic blocks along a single path are encountered in one fetch cycle. (3) Replication of a branch into delay slots along multiple paths yields path separability for the branch which positively impacts branch predictor accuracy. PMTC mechanism requires minimal modifications to the processor’s fetch unit and the delay slot insertion algorithm can easily be implemented within the assembler without compiler support.