Daedalus --> MUPEN64+ Dynarec Transplant Request

Jojo the clown

unread,

Jan 19, 2015, 7:46:37 PM1/19/15

to mupen...@googlegroups.com

HI Guys

According to the internets the PSP & Ingenics chips share the same breed; MIPS32R2 so I would like to know if the dynarec from daedalus could be transplanted into Mupen64+?

https://github.com/hulkholden/daedalus/tree/master/Source/DynaRec

The two mainstream device communities that would(immediately) benefit from this would be the GCW-Zero & CI20 communities.

The GCW, on paper, is more powerful than the PSP(i believe that theyre apples-apples) and is a traditional linux handheld console(along the lines of the GP2x, Canoo, Wiz etc etc). While it is true that the GCW's on screen display is limited to 320x240 most N64 games rarely went 640x480 so I would really like to hear from M64+ devs on the possibility of transplanting to have Mupen64+ running on it.

https://www.youtube.com/watch?v=329rX1QTIkY

The CI20 on the other hand is a dual-core 1.2Ghz-SoC based $65 beast of a single board computer/devkit from Imagination Tech(people who own MIPS & PowerVR) and is the other platform that would also benefit from this :

https://www.youtube.com/watch?v=I2FCHRDUPc8

Here are the full specs/community links for both devices:
ImgTec's creator CI20 :
https://elinux.org/CI20_Hardware#Tech_Spec_overview
https://groups.google.com/forum/#!forum/mips-creator-ci20
GCW-Zero :
https://wiki.surkow.com/Quick_Start_Guide#Specifications
http://boards.dingoonity.org/index.php#c10

Worth mentioning as well that the GCW-Zero has an open source driver for its SoC GPU in the form of the etnaviv project which is capable of opengles 2.0: https://github.com/etnaviv/etna_viv and is in action in the above neverball/neverputt video

Looking forward from hearing from you l337 d3vs!

Regards.

Dorian FEVRIER

unread,

Jan 20, 2015, 6:23:38 PM1/20/15

to mupen...@googlegroups.com

Hi Jojo!

I always love such sexy toys (while they often break the law and/or don't contribute back to FOSS but it's another problem).

Long story short: While the number of GPU/CPU arch increase and we lack of devs. So any dev wanting to contribute is welcome. There is some dev activities around dynarec[1] so I guess it's a good time to go but you have to motivate original devs, dynarec is not something you copy paste over code.

Second, I'm maybe wrong on this but Daedalus is not write like mupen64plus so it wouldn't be so easy. Yet Daedalus code would be a very good resource for anyone wanting to add MIPS32r2 dynarec to mupen64plus.

So if anyone want to integrate MIPS32 dynarec to mupen64plus he is welcome but it's definitly not as simple as copy paste! :)

[1] https://github.com/mupen64plus/mupen64plus-core/commits/master

--
You received this message because you are subscribed to the Google Groups "mupen64plus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mupen64plus...@googlegroups.com.
To post to this group, send email to mupen...@googlegroups.com.
Visit this group at http://groups.google.com/group/mupen64plus.
For more options, visit https://groups.google.com/d/optout.

Jojo the clown

unread,

Jan 21, 2015, 12:04:32 AM1/21/15

to mupen...@googlegroups.com, fevrier...@yahoo.fr

Hi Naran!

Thanks for the input. As far as breaking the law I suggest you check the nature of this forum. I'm just asking for the emulator to support MIPS and not for usage/sharing/distribution of roms. Emulation isnt breaking the law and neither is reverse engineering drivers for GPUs. As far as contributing to the FOSS community I think you should check the community progress first before slandering wantonly; yes the hardware of the device itself is closed but all the underlying software layer isnt;https://github.com/gcwnow

Thanks & Regards,

Dorian Fevrier

unread,

Jan 21, 2015, 1:08:20 AM1/21/15

to Jojo the clown, mupen...@googlegroups.com

I guess I didn't express myself well: I'm not discussing about
usage/sharing/distribution of roms. I'm not arguing emulation nor
reverse engineering drivers for GPUs is breaking the law (emu scene is
highly based on RE). And finally, I'm not slandering FOSS community (as
I consider myself as a member of it :D ).

Once again I'm a huge fan of those toys. I love retro gaming and love
having tiny tech toys.

But, it's _often_ (and it's the word I used) a mess from a legal side:

http://wiki.gp2x.org/articles/g/p/l/GPL_Violation.html
http://boards.dingoonity.org/the-rumor-mill/the-official-fall-of-the-neo-geo-x/msg82528/#msg82528
http://boards.dingoonity.org/dingoo-releases/dingoo-emulation-pack-v1-0/msg40105/#msg40105
http://www.harteex.com/hosted/a320.freeforums.org/dooengine-v0-9c-t1198-40.html
http://www.libretro.com/index.php/retroarch-license-violations/

But I'm happy to see GCW Zero guys taking GPL seriously.

About community open source GPU driver. Don't get me wrong: It's great
and Etnaviv seems quite advanced but as GCW Zero has been Kickstarted, I
would have push for GPU spec publication this way you could focus on
better driver instead of peoples having to do RE work. This is what
raspberry did.

Anyway, if anyone have skills and motivation to create a MIPS32 branch
for the dynarec for mupen64plus it would be appreciate. :)

Dorian

> <javascript:>> a écrit :

>
>
> HI Guys
>
> According to the internets the PSP & Ingenics chips share the same

> breed; MIPS32R2 so *I would like to know if the dynarec from
> daedalus could be transplanted into Mupen64+? *
>
> https://github.com/hulkholden/daedalus/tree/master/Source/DynaRec

> <https://github.com/hulkholden/daedalus/tree/master/Source/DynaRec>
>
> The two mainstream device communities that would(immediately)
> benefit from this would be the GCW-Zero & CI20 communities.
>
> The GCW, on paper, is more powerful than the PSP(i believe that
> theyre apples-apples) and is a traditional linux handheld
> console(along the lines of the GP2x, Canoo, Wiz etc etc). While it
> is true that the GCW's on screen display is limited to 320x240 most
> N64 games rarely went 640x480 so I would really like to hear from
> M64+ devs on the possibility of transplanting to have Mupen64+
> running on it.
>
> https://www.youtube.com/watch?v=329rX1QTIkY
> <https://www.youtube.com/watch?v=329rX1QTIkY>
>
> The CI20 on the other hand is a dual-core 1.2Ghz-SoC based $65 beast

> of a single board computer/devkit from Imagination Tech(*people who
> own MIPS & PowerVR*) and is the other platform that would also

> benefit from this :
>
> https://www.youtube.com/watch?v=I2FCHRDUPc8
> <https://www.youtube.com/watch?v=I2FCHRDUPc8>
>
> Here are the full specs/community links for both devices:

> *ImgTec's creator CI20* :
> https://elinux.org/CI20_Hardware#Tech_Spec_overview
> <https://elinux.org/CI20_Hardware#Tech_Spec_overview>
> https://groups.google.com/forum/#!forum/mips-creator-ci20
> <https://groups.google.com/forum/#!forum/mips-creator-ci20>
> *GCW-Zero* :
> https://wiki.surkow.com/Quick_Start_Guide#Specifications
> <https://wiki.surkow.com/Quick_Start_Guide#Specifications>
> http://boards.dingoonity.org/index.php#c10

> <http://boards.dingoonity.org/index.php#c10>
>
> Worth mentioning as well that the GCW-Zero has an open source driver
> for its SoC GPU in the form of the etnaviv project which is capable
> of opengles 2.0: https://github.com/etnaviv/etna_viv
> <https://github.com/etnaviv/etna_viv> and is in action in the above
> neverball/neverputt video
>
> Looking forward from hearing from you l337 d3vs!
>
> Regards.
> --
> You received this message because you are subscribed to the Google
> Groups "mupen64plus" group.
> To unsubscribe from this group and stop receiving emails from it,

> send an email to mupen64plus...@googlegroups.com <javascript:>.

> To post to this group, send email to mupen...@googlegroups.com

> <javascript:>.

> Visit this group at http://groups.google.com/group/mupen64plus

> <http://groups.google.com/group/mupen64plus>.

> For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.
>
>

Jojo the clown

unread,

Jan 21, 2015, 3:20:19 AM1/21/15

to mupen...@googlegroups.com, forphu...@gmail.com

Hi Dorian,

Thanks alot for that and I guess Im sorry for jumping the gun on that; wasnt aware of all that backstory so I understand where youre coming from now :s As far as the rbpi is concerned , imo, the reason BCM opened up the entire documentation spec was because they already knew that they werent going to stick around long in the mobile SoC market(just my conspiracy theory!)

Hopefuly a very talented & motivated Mupen64+ dev(like yourself??) would come along , look at the daedalus code and go "challenge accepted!" XD

Many members of the community who's been clamoring for N64 support would be forever indebted(I know I would!) Id personally even donate some $$ to your efforts!

Thanks & Regards,

bobby smiles

unread,

Jan 21, 2015, 4:38:16 AM1/21/15

to mupen...@googlegroups.com, forphu...@gmail.com

Hi,

please take a look at this downstream project [1].
I never tried it but, its looking like this dev ported m64p to MIPS platforms.

Regard,
Bobby

[1] https://github.com/Nebuleon/

alexkr...@gmail.com

unread,

Jan 21, 2015, 7:45:43 AM1/21/15

to mupen...@googlegroups.com, forphu...@gmail.com

Hi Bobby,

I am also very interested in this project as I am thinking that MIPS32 will have a bigger presense in coming years(because of imagination technologies pushing as a mobile platform). I am looking at https://github.com/Nebuleon/mupen64plus-build and it looks like an attempt to build it for said(GCW) system but there is a disclaimer in the readme :

"'neb-dynarec' is a deprecated branch as of 2014-09-04, which has become too complicated for its own good."

So i dont know how good this is for dynarec contribution to the mupen64plus upstream. I have also looked through the dingoonity boards and have yet to find a single example of successful N64 emulation on the GCW(alot of requests & talk though) so I am dubious as to the status of this build(as far as MIPS32r2 dynarec is concerned)

Can somebody else confirm this?

Until then I will look at the Daedalus source since it is a proven example of N64 emulation MIPS32r2 dynarec.

Thanks

Dorian FEVRIER

unread,

Jan 21, 2015, 9:02:03 AM1/21/15

to mupen...@googlegroups.com, forphu...@gmail.com

Interesting.

It seems that only the Makefile has changes. For the Core at least.

--
You received this message because you are subscribed to the Google Groups "mupen64plus" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mupen64plus...@googlegroups.com.
To post to this group, send email to mupen...@googlegroups.com.
Visit this group at http://groups.google.com/group/mupen64plus.
For more options, visit https://groups.google.com/d/optout.

nebuleo...@gmail.com

unread,

Jan 22, 2015, 4:03:30 AM1/22/15

to mupen...@googlegroups.com, forphu...@gmail.com

Hi, this is Nebuleon, the downstream developer mentioned in the message by Bsmiles. I came here via [1], as well as more people telling me about it.

On Wednesday, January 21, 2015 at 12:45:43 PM UTC, alexkr...@gmail.com wrote:

Hi Bobby,

I am also very interested in this project as I am thinking that MIPS32 will have a bigger presense in coming years(because of imagination technologies pushing as a mobile platform).

I am looking at https://github.com/Nebuleon/mupen64plus-build and it looks like an attempt to build it for said(GCW) system but there is a disclaimer in the readme :

"'neb-dynarec' is a deprecated branch as of 2014-09-04, which has become too complicated for its own good."

That's correct. neb-dynarec is a branch that was essentially experiments in making a JIT from scratch that was applicable both to 32-bit and 64-bit processors, having different features like indexed/base+disp addressing and 16-bit offset addressing (such as MIPS emulating itself) and supporting different flavors of split 64-bit opcodes. Unfortunately the need for supporting 32 and 64-bit made the code balloon in complexity, with ifdefs absolutely everywhere, and I was not really doing much intermediate representation (IR) optimisation at all. So I officially deprecated that code and stopped working on it.

Then, as I discovered the wonders of IR, I tried doing compilation of opcodes via LLVM and its IR on the branch llvm-experiments [2]. My development time was super speedy and the code was verifiably correct even when optimised by LLVM, but as I added more and more opcodes to the recompiler, I found out that LLVM is not very suitable for JIT [3], especially of (on the order of) megabytes of MIPS opcodes getting converted to LLVM IR. For example, on my Core i5 2.8 GHz, LLVM took 5 full seconds to compile code used by the title screen of Super Mario 64-USA (the bendy Mario face).

I also found out that LLVM IR does not support setting the native FPU rounding mode in IR in any way, which is pretty important for some games. llvm-experiments does not have any FPU opcode whatsoever recompiled.

So yeah.

So i dont know how good this is for dynarec contribution to the mupen64plus upstream.

Me neither ^_^ Though I advise against merging anything I've done so far, personally.

I have also looked through the dingoonity boards and have yet to find a single example of successful N64 emulation on the GCW(alot of requests & talk though) so I am dubious as to the status of this build(as far as MIPS32r2 dynarec is concerned)

Can somebody else confirm this?

I hereby confirm both the volume of requests and hope/hype, as well as the lack of successful example. People in the GCW Zero community have only been fishing for developers so far, who seem to have all declined [1] [4] [6] [this thread], though Narann seems to be accepting of outside contributions (thanks, by the way!).

I have also been mostly secretive about my efforts, not wanting to be bombarded by "please tell me your progress!" or "please do it! we believe!" posts and private messages while I was learning, because then I would feel very pressured and crumble. I'm still not ready, though I did contribute some commits to mainline mupen if you look hard enough. ^_^

Until then I will look at the Daedalus source since it is a proven example of N64 emulation MIPS32r2 dynarec.

(Even though that wasn't directed at me:) All right. Its JIT seems to be well designed and a good base for further examination, but beware the details: it has *lots* of speed hacks, notably for the TLB, and it mishandles the value of register $0 whose value should stay 0 at all times in many opcodes.

Replying to the thread as a whole, I've been reluctant to contribute anything because, well, my early attempt has been crap, and LLVM is very slow. I have been trying to learn more about basic blocks, opcode splitting, register assignment, control flow graphs, SSA and compiler optimisations so that I may implement a more lightweight multi-architecture JIT than with LLVM. However:

a) That is a lot to learn for someone who only has a college education and has not been formally introduced to graph and set theory and related algorithms.

b) I don't know if I want to work on a JIT for a device that *I don't even see running any game full-speed*.

N64 registers need to be reloaded into native registers via memory reads when jumping into new blocks, each time costing *4 cycles* [5]; there is a lot of recompiled code to handle and it's too big to fit in our 16 KiB instruction cache which has a *4-cycle* miss penalty; and, given that the Ingenic JZ4770 is MIPS32r2 as previously noted, 64-bit register loads and MIPS III opcodes must be split (which sees the load and execution time soar, as well as the code size even more, leading to more instruction cache misses).

Between that, resolving N64 memory references, trapping writes to already-recompiled code in RDRAM and invalidating it, the RSP, the RDP, and the graphics plugin, each N64 frame-time would need at least 2 GCW Zero frame-times in the simplest of games and the most optimal of JITs, and thus even frameskip wouldn't help.

The GCW Zero runs the Cached Interpreter at 6 (Super Mario 64) to 17 (Conker's Bad Fur Day) frame-times per N64 frame-time.

I hope this clears things up for everyone involved. ^_^

Regards,
Neb.

[1] http://boards.dingoonity.org/gcw-zero-emulation/n64-emulator/
[2] https://github.com/Nebuleon/mupen64plus-core/commits/llvm-experiments
[3] http://stackoverflow.com/questions/6833068/why-is-llvm-considered-unsuitable-for-implementing-a-jit
[4] http://forums.daedalusx64.com/viewtopic.php?f=43&t=4265
[5] http://boards.dingoonity.org/gcw-development/memory-access-timings-for-the-jz4770/
[6] http://www.drastic-ds.com/viewtopic.php?f=4&t=2395

bobby smiles

unread,

Jan 23, 2015, 8:11:16 AM1/23/15

to mupen...@googlegroups.com, forphu...@gmail.com, nebuleo...@gmail.com

@Nebuleon :
Thanks for this very informative answer !
I saw your experiments with LLVM some months ago, and I was very curious about how it performed.
I was secretly hoping that it would be a success so that we only had to feed LLVM with IR and let it do the conversion to assembly that the machine understand (MIPS, ARM, x86, x86_64, ...)? That would have quite decreased the complexity of the code and ease its maintenance (for now we have pure interpreter, cached interpreter, old dynarec (x64, x86_64) and new_dynarec (x86, ARM)).
Unfortunately, according to your experiments a custom dynarec is still a better option.

Since you have tinkered quite a bit with the m64p core codebase, I'd like to get some of your thought/opinions about the current codebase, so we can improve it.

Regards,
Bobby

nebuleo...@gmail.com

unread,

Jan 23, 2015, 5:40:37 PM1/23/15

to mupen...@googlegroups.com

Hi Bobby,

About LLVM, yeah it is very slow to request compilation from - I'd have to do interpretation as the first tier, and then have the code switch to LLVM once a block has been executed a certain number of times, a bit like Java and the Sun HotSpot compiler.

As for the core, here are some of my observations, in no particular order:

1. The Hacktarux JIT depends on the Cached Interpreter's behavior and structures, and adds things that are not needed when using only the Cached Interpreter. For example, there's a structure for each instruction in the Cached Interpreter [1], that is either 98 bytes [2] or 156 bytes [3]. That memory only required in the Hacktarux JIT and wasted otherwise, and an array of the resulting structure lacks cache line alignment for most of its elements. Modifying the signature or behavior of C functions for which calls are baked into native code generated by the Hacktarux JIT (memory accessors, gen_interupt, etc.) breaks it.

2. The Hacktarux JIT does not use blocks very well. If there's a jump to an opcode that hasn't yet been recompiled, the entire page is recompiled, in a way that any of the opcodes may then be the target of a jump. So if the native code used for the target of the jump has registers allocated, the "jump_wrapper" is executed first to load those [5]. Additionally, there can be no optimisation of runs of opcodes, due to the need to be able to jump to any of them later. (Say you jump to 0x8003_2C4C and the code is [LUI $4, 0x8020; ORI $4, $4, 0xC140; LW $4, 0($4)]. The constant memory reference cannot be optimised, because other code could jump to 0x8003_2C54, skipping the LUI+ORI.)

Forming different code blocks according to which instruction is jumped-to would make more efficient code and get rid of the register-loading jump stubs, because then each block would start off with nothing allocated and load exactly what it needs. That would also allow constant propagation for things like the code above, [LUI $4, 0x8020; ORI $4, $4, 0xC140; LW $4, 0($4)], which is really a reference to the constant address 0x8020_C140 in the N64 address space and can be turned directly into a load from *(uint32_t*) ((uint8_t*) rdram + 0x20C140). Those are pretty common references in N64 games. If the code is jumped-to at the third instruction, the LW, then a separate block that assumes no known value in $4 would be made.

3. Use of global variables. All memory accessors work by reading the value of 'address'. Read accessors (LB, LH, LW...) then read the value of 'rdword' and store through that pointer; write accessors (SB, SH, SW...) then read the value of 'cpu_byte', 'hword', 'word' or 'dword'. The caller has stored values there in memory, and the memory accessor must reload the values from memory. Done millions of times per second, the performance would be better if the values could simply be passed as function parameters, which go into registers in most ABIs. But this would break the Hacktarux JIT. (Not the New Dynarec, because it has its own memory accessors.)

4. Empty stubs are required for the Hacktarux JIT. empty_dynarec.c must be compiled and linked in even when its functions would be unused (i.e. !DYNAREC or NEW_DYNAREC [4]). I'm sure there's a technical reason, though I'm not sure what it is.

5. The Pure Interpreter depends on the Cached Interpreter. The Pure Interpreter asks the Cached Interpreter to prefetch 2 opcodes, the one at PC and its possible delay slot. This time, only 2 precomp_instr structures are used so it doesn't waste memory, and the structures are likely to be always in cache, but the Pure Interpreter is not self-contained.

Keeping all the memory-related code, the Coprocessor 0 and FPU opcodes, the TLB, exceptions and interrupts in their separate files is a nice touch; it's just the various interpreter and JIT drivers that all go back to the Cached Interpreter. Compatibility with the Hacktarux JIT may also be holding the core back.

Regards,
Neb.

[1] https://github.com/mupen64plus/mupen64plus-core/blob/fe84dea/src/r4300/recomp.h#L69-L70
[2] https://github.com/mupen64plus/mupen64plus-core/blob/fe84dea/src/r4300/x86/assemble_struct.h
[3] https://github.com/mupen64plus/mupen64plus-core/blob/fe84dea/src/r4300/x86_64/assemble_struct.h
[4] https://github.com/mupen64plus/mupen64plus-core/blob/9e5e1da/projects/unix/Makefile#L453-L490
[5] https://github.com/mupen64plus/mupen64plus-core/blob/fe84dea/src/r4300/x86/rjump.c#L57-L60

bobby smiles

unread,

Jan 24, 2015, 11:20:24 AM1/24/15

to mupen...@googlegroups.com, nebuleo...@gmail.com

Thanks for your feedback on the core.

I cannot really comment on the efficiency of the various dynarecs because I don't understand them very well.
However I deeply agree with your opinions on the usage of global variables and all other hard-to-break dependencies.
That's why I proposed recently 2 PR to improve the situation
[1] is an attempt to avoid usage of global variables for accessing memory (ie address, word, hword, cpu_byte, dword, rdword, ..;).
The problem is that it broke the new_dynarec's ARM backend and I have not been able to fix that for now.
[2] provides better modularization of core components. It largely follows the architecture proposed by MarathonMan for his CEN64 emulator
(which is a real model of clean code :) )

I hope it will ease the development of the core and help attract new contributors [like you :) ].

Other points I'd wish we improve includes:
-having a single "build system" (for now we maintain a Makefile, and several Visual Studio Solution/Project files).
Switching to CMake might ease the synchronization between current build systems and provide devs with "native" build instructions.
-having testable code (unit test, test roms for various hardware verified behavior) [removing global variables is a step toward having testable code]
-kill compatibility with zilmar spec altogether, and integrate inside the core relevant plugins behavior.
[Rationale:
-we already break the zilmar spec by using a Config API so externally written plugins already need to be ported for our emulator.
-the spec imposes a fixed interface that we cannot break for "compatibility" reason, so no chance of refactoring things here if we let plugins be separate entities.
-proliferation of multiple plugins (especially gfx plugins) split dev time, and highly increase maintenance effort.]
I am aware that this is a massive effort to remove plugins, and wouldn't be wise to do that now (especially with the soon-to-come gonetz new GFX plugin).
But one can dream...

[1] https://github.com/mupen64plus/mupen64plus-core/pull/47
[2] https://github.com/mupen64plus/mupen64plus-core/pull/62

Reply all

Reply to author

Forward