Compared to other approaches, the paper considers memory, buses,
decompression unit and the CPU in evaluating the energy consumption of
the system. The use of a post-cache architecture allows the compressed
instructions to reside in the cache and hence yield a major benefit of
higher hit ratio. To differentiate among the various groups of
compressed instructions, a separate code in appended to each
instruction. This means for instructions that are left uncompressed,
the total size occupied by the "compressed" instruction is more
compared to its uncompressed form. For such instructions, additional
delay will be encountered in determining their group and hence, the
overall execution time. Because of this, the overall performance/power
improvement of the system will depend on the percentage of
instructions in the particular application that fall into the last
group. Also, group 1 instructions write variable number of bits (which
are kept track of, by using extra 'length' bits).
The paper mentions that a buffer is required when a large number of
group 1 instructions arrive from the cache for decoding. Is this a
required feature considering that the controller can tell the CPU to
stop fetching more instructions (through the stall signal)? It would
be required if there is a delay between the pipeline becoming full and
the controller "realizing" that the pipeline has become full. It would
also be required if there is only one decoding unit for each group of
instruction and hence if two instructions belonging to the same group
arrive back-to-back, then the second would have to be buffered. But in
such a case, wouldn't the buffer be required for all groups instead of
just for group 1. I think the buffers being talked about here are not
"buff 1" and "buff 2" because, as mentioned by the authors, this
buffer is required when the pipeline is full and hence should be used
*prior* to decompressing the instructions.
As pointed out by Dr. Gupta in class, since the Instruction decoder is
directly passing the instructions that are fetched, to the Branch
Unit, there must be some decompressing logic embedded in the
Instruction decoder. Alternatively, the branch unit must be capable of
uncompressing instructions before executing them. Also, SAMC is used
for group 1 instructions but the paper does not mention the method of
compression used for group 2 and group 3 instructions.
The extra hardware required is in the form of decoding and dictionary
tables, the controller, buffers and other components along with the
logic to connect all of these together. As the authors point out, for
large applications, the extra cost of hardware is offset by the memory
reduction benefit and hence this architecture may not suit smaller
applications.
Lastly, in the experimental results section, a percentage-wise breakup
of the number of instructions belonging to each group would have given
better insight into the performance and energy improvement obtained
from this architecture.
Ashay Rane
I think the buffer you mentioned in your second paragraph is, actually
no, not buf1 or buf2. In my opinion, the buffer here means the "input
buffer." (ibuf in Fig.2 in the paper) Why? In the paper, the author
says the buffer will be needed because in some cases there won't be
any whole instruction for the decoder to send to its path, and the 32
bits must be stored until the next cycle when the rest of the
instruction will arrive. And another reason for having this buffer is
that if the pipeline is full, when more group1 instructions arrive
from the cache they must be buffered temporarily. So, from the above
sentences, I think the "buffer" in the paper indicates the input
buffer.
And for your third paragraph, as you say, the author only mentions
that the Group 1 uses SAMC to compress instructions expect other
groups. I also wonder about how they compressed the instructions in
the Group2 and Group3. The author points out that those instructions
need be decompressed without saying how the decoder worked. I think it
is because the paper mainly focused on the benefit from decompressed
design with SAMC in instructions (group1).
Best regards,
Yi-hsin
> > }- 隱藏被引用文字 -
>
> - 顯示被引用文字 -
For your weak points (e), I think the bits they appended is 3-bit
instead of 4-bit for the uncompressed instruction. In the paper, page
572, the note in the bottom in column 1, it says that uncompressed
instructions will be stored in more than 32 bits since a "3-bit code"
is appended to differentiate them from other groups.
Best Regards,
Yi-hsin
On 11月8日, 下午1時08分, "Ayan Banerjee" <abane...@gmail.com> wrote:
> Please find my critique in the attached file.
>
> critique.pdf
> 31K下載- 隱藏被引用文字 -
>
> - 顯示被引用文字 -