How to simulate a new Hardware Unit and its x86_64 ISA extension

641 views
Skip to first unread message

Andrea Mondelli

unread,
May 14, 2013, 7:09:09 PM5/14/13
to snip...@googlegroups.com, Alberto Scionti
Hi All,

I'm starting to use snipersim for research purposes. I'd like to understand if there is a way to implement an ISA extension in the simulator. In particular, I'd like to simulate an additional hardware module that it is accessed by the standard core(s) when executing special instructions that I want to add to the ISA as an extension. To be more clear, I'd like to create a software module that is able to functionally simulate the behavior of the hardware one in the architecture, and that is also able to provide the timing information to the snipersim. 
Example: suppose I want to add a new Hardware Failure Unit and relative new instructions (extending standard x86_64 ISA) like ask_authorization_to_execute() or put_thread_in_queue_and_wait(). Suppose I want to implement the functional behaviour of these new ISA instruction in a different "software module" (as example intercepting specific useless asm instructions). New programs (with extended ISA) will see only what these instruction returns (because the simulator emulates the execution of these new instructions on this new external hardware unit). I want:
1) implement the functional behaviour via software (because standard X86_64 ISA doesn't support these instructions)
2) implement the timing model for this new hardware unit (with cache access, internal tables and queue, bus access, and so on)
3) run programs (like modified benchmarks) that use these new instructions , simulating this new "architecture", where "architecture" means off-the-shelf CPU +  bus + memory hierarchy + my new hardware module, and obtain results

Is there any way to implement this on Sniper? Can be possible to do it, in your opinion? 

Suggestions are welcome! Thanks in advance

A.

Wim Heirman

unread,
May 15, 2013, 8:45:04 AM5/15/13
to snip...@googlegroups.com, Alberto Scionti
Hi Andrea,

This is definitely possible with Sniper, although it will take a bit
of work depending on the level of detail at which you want to model
this (but given Sniper's abstraction level I'd say it would be a lot
more doable than on other, more detailed simulators).

Functional side: The easiest thing would be to use magic instructions
for this. In your application, #include <sim_api.h>, then call
SimUser(a, b) to execute a new instruction. You can use a and b to
encode arguments or instruction type, and expect a return value (all
uint64_t). If needed one of those arguments can point to a struct with
more parameters. The functional emulation can go in
pin/instruction__modeling.cc:handleMagic(), here you can look at a and
b (in the rbx and rcx registers) and do the proper thing, potentially
changing the application's state (use Pin's PIN_SetContextReg),
updating other per-thread state to simulate extra registers (extend
the ThreadLocalStorage structure for that), or returning a value
through the rax register.

Timing side: I think doing a
core->getPerformanceModel()->queueDynamicInstruction(), using your own
subclass of Instruction as the argument, from your functional emulator
is probably the easiest way to get the instructions into the timing
model. You'll see these come out in PerformanceModel::iterate() where
you can add the required modeling: i.e. advance time, do memory
accesses (which will follow the normal path through the memory
hierarchy) using core->accessMemory(), etc.

Note that your new ISA extensions, by using magic instructions, will
be NOPs if you execute this application natively. You can use
SimInSimulator(), also defined in sim_api.h, to figure out whether
you're running in the simulator or not.

I know I've probably been a bit vague here at points, but I don't know
exactly what you'll need -- so hopefully this will allow you to get
started and do let us know if you need more information.

Regards,
Wim


[1] http://software.intel.com/sites/landingpage/pintool/docs/55942/Pin/html/group__CONTEXT__API.html#ga3f8746ccdac1c1fbcb2e2f3f3cd7bcb
> --
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Sniper simulator" group.
> To post to this group, send email to snip...@googlegroups.com
> To unsubscribe from this group, send email to
> snipersim+...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/snipersim?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Sniper simulator" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to snipersim+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Andrea Mondelli

unread,
May 15, 2013, 11:03:28 AM5/15/13
to snip...@googlegroups.com




Il giorno mercoledì 15 maggio 2013 14:45:04 UTC+2, Wim Heirman ha scritto:
Hi Andrea,

[CUT]

Thanks for reply. I try to explain my point betters:

MM: main-> SI -> SI -> SI -> CI -> SI -> CI -> SI -> return
SI = standard x86_64 istructions
CI = custom x86_64 instructions

MM is mutrix multiplier (a easy benchmark program, as example)
Every SI call "standard" timing model (cache, memory, bus, bla bla bla)
Every CI call "custom" timing model + "standard" timing model (e.g. custom cache, standard bus, second custom bus, new hardware tables and queue) 



Functional side: The easiest thing would be to use magic instructions
for this. In your application, #include <sim_api.h>, then call
SimUser(a, b) to execute a new instruction. You can use a and b to
encode arguments or instruction type, and expect a return value (all
uint64_t). If needed one of those arguments can point to a struct with
more parameters. The functional emulation can go in
pin/instruction__modeling.cc:handleMagic(), here you can look at a and
b (in the rbx and rcx registers) and do the proper thing, potentially
changing the application's state (use Pin's PIN_SetContextReg),
updating other per-thread state to simulate extra registers (extend
the ThreadLocalStorage structure for that), or returning a value
through the rax register.

Ok, I think I understand , I can try asap with a testing functions. The goal is  when I run CI, I need to use a TM (timing model) for new hardware (my custom hardware unit) instead of normal hw.
 

Timing side: I think doing a
[CUT]

Correct me if I'm wrong. I can implement my CI and the functional behaviour, then I can add later the architecture design and its timing model.

MM: main -> SI -> SI -> SI -> return
Time: main + n + n + n + return where n is the latency of each SI (cache latency, memory, and so on)

MM+my new IS:  main -> SI -> SI -> CI -> SI -> CI -> return
with sniper timing model: main + n + n + 0 + n + 0 + return
with sniper timing model and my hardware unit timing model: main + n + n + M + n + m + return
n = standard latency
m = my hardware unit latency

 
Note that your new ISA extensions, by using magic instructions, will
be NOPs if you execute this application natively. You can use
SimInSimulator(), also defined in sim_api.h, to figure out whether
you're running in the simulator or not.

Do you have some examples or specific sniper code that I need to read to understand this ?
 

I know I've probably been a bit vague here at points, but I don't know
exactly what you'll need -- so hopefully this will allow you to get
started and do let us know if you need more information.

Your answare was very very very helpful.
My example was an Hardware Failure Unit, but can be a new FP (Floating Point) Unit, or some other non standard hardware unit that do something when I call a new ISA Instruction inside standard programs. My idea is to use sniper to simulare a new new machine composed by Standard X86-64 Units + Custom Unit. Standard Units (cpu, cache, bus, memory) are already presents in Sniper, I want to add the Custom hw Unit (and its functional behaviour)
 

Wim Heirman

unread,
May 17, 2013, 3:51:14 AM5/17/13
to snip...@googlegroups.com
> Correct me if I'm wrong. I can implement my CI and the functional behaviour,
> then I can add later the architecture design and its timing model.
>
> MM: main -> SI -> SI -> SI -> return
> Time: main + n + n + n + return where n is the latency of each SI (cache
> latency, memory, and so on)
>
> MM+my new IS: main -> SI -> SI -> CI -> SI -> CI -> return
> with sniper timing model: main + n + n + 0 + n + 0 + return
> with sniper timing model and my hardware unit timing model: main + n + n + M
> + n + m + return
> n = standard latency
> m = my hardware unit latency

Yes, that's correct.

>> Note that your new ISA extensions, by using magic instructions, will
>> be NOPs if you execute this application natively. You can use
>> SimInSimulator(), also defined in sim_api.h, to figure out whether
>> you're running in the simulator or not.
>
>
> Do you have some examples or specific sniper code that I need to read to
> understand this ?

The example in sniper/test/api uses this.

-Wim
Reply all
Reply to author
Forward
0 new messages