Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stanford Mill/encoding video now up on YouTube

80 views
Skip to first unread message

Ivan Godard

unread,
Jun 5, 2013, 12:05:26 PM6/5/13
to
This should fix the access problems that people have been reporting when
trying to watch it direct from Stanford. Please report any further
issues to me. Thanks.

http://www.youtube.com/watch?v=CiMyRwF-oA4&list=UUBa5G_ESCn8Yd4vw5U-gIcg

Ivan Godard

unread,
Jun 5, 2013, 12:33:50 PM6/5/13
to

Ivan Godard

unread,
Jun 5, 2013, 1:37:44 PM6/5/13
to
On 6/5/2013 9:05 AM, Ivan Godard wrote:
> This should fix the access problems that people have been reporting when
> trying to watch it direct from Stanford. Please report any further
> issues to me. Thanks.

And yet another link:
http://youtu.be/LgLNyMAi-0I

BGB

unread,
Jun 27, 2013, 2:39:40 AM6/27/13
to
watched video, and admittedly it is a little hard for me to fully
understand it (this not really being my main area).


one thought that I did start wondering about is if something like this
could be possible (possibly simpler?):
initially, you grab, say, 256 bits.

either, everything executes all at once, or the block is split in half;
and, either these blocks execute at once, or spread over multiple cycles.

the other part could be rather than the execution proceeding in both
directions, it could via an even/odd path.

for example, something sort of like:
1 Block A Read (E)
2 Block B Read (O)
Block A Split 2x128 bit
3 Sub-Block A1 Executes
Block B Split 4x64
Block C Read (E)
4 Sub-Block A2 Executes
Sub-Block B1 Executes
Block C Direct
Block D Read (E)
5 Sub-Block C1 Executes
Sub-Block B2 Executes
Block D Split 4x64
6 Sub-Block D1 Executes
Sub-Block B3 Executes
Block F Read (O)
7 Sub-Block D2 Executes
Sub-Block B4 Executes
Block F Split 2x64
8 Sub-Block D3 Executes
Sub-Block F1 Executes
...

more so, if needed, it could be split more, say we have 4 paths X,Y,Z,W,
which are in turn interleaved (possibly with 4 different instruction
pointers).

likewise, each block would have a particular instruction form and hold N
opcodes.


possibly block layouts could be something like:
0AAXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX_TBBXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX
1AAXXXXX_XXXXXXXX_TBBXXXXX_XXXXXXXX_TCCXXXXX_XXXXXXXX_TDDXXXXX_XXXXXXXX
2AAXXXXX_TBBXXXXX_TCCXXXXX_TDDXXXXX_TEEXXXXX_TFFXXXXX_TGGXXXXX_THHXXXXX
3AAXTBBX_TCCXTDDX_TEEXTFFX_TDDXTGGX_THHXTIIX_TJJXTKKX_TLLXTMMX_TNNXTOOX
4AAXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX_0BBXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX
5AAXXXXX_XXXXXXXX_TBBXXXXX_XXXXXXXX_1CCXXXXX_XXXXXXXX_TDDXXXXX_XXXXXXXX
6AAXXXXX_TBBXXXXX_TCCXXXXX_TDDXXXXX_2EEXXXXX_TFFXXXXX_TGGXXXXX_THHXXXXX
7AAXTBBX_TCCXTDDX_TEEXTFFX_TDDXTGGX_3HHXTIIX_TJJXTKKX_TLLXTMMX_TNNXTOOX
8-
9AAXXXXX_XXXXXXXX_1BBXXXXX_XXXXXXXX_1CCXXXXX_XXXXXXXX_1DDXXXXX_XXXXXXXX
AAAXXXXX_TBBXXXXX_2CCXXXXX_TDDXXXXX_2EEXXXXX_TFFXXXXX_2GGXXXXX_THHXXXXX
BAAXTBBX_TCCXTDDX_3EEXTFFX_TDDXTGGX_3HHXTIIX_TJJXTKKX_3LLXTMMX_TNNXTOOX

where the first nibble mostly indicates block format (encoding both a
layout and temporal split), with in this case typically using 8 bit
opcodes. T would be special bits (to maintain pattern, possibly would
give bits to adjacent opcodes to allow a 6/10-bit opcode or similar),
and X is operand payload.

0-3 would be single-cycle blocks, 4-7 taking 2 cycles, and 8-B taking 4
cycles. in split blocks, the secondary tags indicate the instruction
layout of a given block.

0 would have 128-bits per op, 1 has 64-bits/op, 2 has 32-bits op, and 3
has 16-bits/op.


or such...

BGB

unread,
Jun 27, 2013, 3:56:40 AM6/27/13
to
minor add:
probably need to add a mechanism for each block to encode a "skip" of
how many blocks until its next block.

probably the middle tag could reserve 2 bits:
0=next physical block;
1=2nd next physical block;
2=3rd next physical block;
3=4th next physical block.

probably with 2/3, 6/7, A/B, reserving 2 more bits (allowing a skip up
to 16 blocks). basically, since these take longer to execute, there is
more chance of them becoming further misaligned with the execution stream.

possibly with a special escape case for cases where the next block is
out of range (skip nop?).


idle (side note):
a basically similar block-layout is used in a custom audio codec of
mine, just the blocks are 1024/512/256 bits, and it is naturally CBR (it
was designed mostly for fast decoding and random access, hence the
unorthodox design idea of using fixed-size/fixed-format blocks rather
than entropy coding...). (basically, it can be randomly accessed
reasonably quickly using a sample-cache, and does not require up-front
decoding into PCM).


>
> or such...
>

0 new messages