Compiling Open Sparc Assembly

359 views
Skip to first unread message

Devon White

unread,
Sep 4, 2017, 3:36:54 PM9/4/17
to OpenPiton Discussion
Hi,
I just tried and failed to get a working build of a gcc cross compiler for the second time, so I was wondering what tool you guys use to compile c code into open sparc assembly that can be run on the board using pitonstream. Thank for the help,
-Devon

jbalkind

unread,
Sep 4, 2017, 4:54:11 PM9/4/17
to OpenPiton Discussion

Hi Devon,

Have you tried the cross compiler we provide in the cross_compiler/ subdirectory of our download page? We've been using that without any trouble.

Let us know if we have to tweak something to get it working on other systems.

Thanks,
Jon

Devon White

unread,
Sep 6, 2017, 4:14:21 PM9/6/17
to OpenPiton Discussion
I tried this and did get it to work, so thank you. I had a follow up, though, which is that the code I compile with it seems to fail for no real reason when running it on the board. For example, I took this code and compiled it with "sparc64-linux-gnu-gcc -S basic.c" to get basic.s and used pitonstream to run it on the board:

#include "libc.h"
int main() {
    //int x;
    //x = 2;
    pass();
}

It compiled and passed (I have to add a #include "boot.s" to the assembly file, but that's fine). Then when I uncomment and try this:

#include "libc.h"
int main() {
    int x;
    x = 2;
    pass();
}

It compiles and loads successfully, but the test fails. I do not have any idea why this would be the case, and would appreciate any help figuring it out. I have attached the assembly files.
-Devon
basic_add.s
basic_no_add.s

Alexey Lavrov

unread,
Sep 6, 2017, 8:03:30 PM9/6/17
to OpenPiton Discussion
Hi Devon,

Do you get an error code when test fails?
It might be a reason that st instruction, which saves local variable in a frame is accessing unmapped memory region or corrupts some other data.
Indeed, we have not tested running C tests on an FPGA using pitonstream yet. It can be the case that pitonstream assumes fixed memory layout, which is met when using sims, but may be violated while using gcc.

Now I guess you have a question how to fix this problem.
Well, there is a file $DV_ROOT/design/chipset/rtl/storage_addr_trans_unified.v which maps processor addresses to ddr memory addresses.
You can first start with checking that hit_any_section signal is always true during address translation when running a test. If not, then behavior is unpredictable.

Let me know if it helps.

Alexey

Devon White

unread,
Sep 7, 2017, 3:12:44 PM9/7/17
to OpenPiton Discussion
How were you guys able to run C code on this FPGA?
We need a method to compile C to run on the FPGA.
It is the st instruction that is causing the failure. How do we fix this?

Devon White

unread,
Sep 8, 2017, 3:12:04 PM9/8/17
to OpenPiton Discussion
Apologies for the hasty previous response, allow me to clarify. We are trying to run BFS on the FPGA. As this is a fairly large and complex algorithm, we will need to implement it in C. Our current thought on how to do this was to run the C code through the cross compiler with the -S flag to get the assembly code, and then use pitonstream to run the compiled assembly code on the board. Is there another method that you used to run C code on the FPGA that is not pitonstream? Currently our method is working for smaller test cases, but we have to append some boot stuff from your assembly tests at the beginning of the assembly file and remove some of the stuff gcc adds. We are not sure this will be scalable to a large application like BFS.

We have also run into a problem where gcc will do memory accesses with stb that are only byte aligned, which does not work with pitonstream. We have some ideas for how to fix this, but any suggestions or reasons that this does not work would be good.

I apologize again for the last message, my colleague wrote it in haste, and thank you again for your help,
-Devon

Alexey Lavrov

unread,
Sep 8, 2017, 3:40:46 PM9/8/17
to OpenPiton Discussion
Hi Devon,

I see your point. There is definitely a need for you to be able to compile large C programs.
Could you clarify what do you mean by "it is will be not scalable to a large application"?
If you mean loading speed, we have a faster solution for Genesys2 and NexysVideo boards. However, it's not yet in the release.
If you are talking about size of mapped memory region, it can be change in storage_addr_trans_unified.v. You just have to analyze which sections are generated by gcc.
I can provide you an assistance changing it for your needs.

Also, do you have to run tests on bare metal? You can run C tests from OS too.

No worries, I totally understood your reasoning.
We are glad that you find OpenPiton platform useful for your research.

Alexey

Devon White

unread,
Sep 8, 2017, 4:01:55 PM9/8/17
to OpenPiton Discussion
So in the long run what we are trying to do is add a scratch pad memory to the design and add an opcode to write and another to read from it. I did not previously consider booting an OS on the board and then running tests from there, as it seemed more complicated. Do you think that it would be simpler to boot an OS and run C tests from there to test this new functionality?

When I said "not scalable" I meant more in the sense that our workflow is not scalable. We are currently using the cross-compiler to get assembly code, but the code which comes out of the cross compiler needs to be edited by hand in several ways in order to get it to run correctly on the board using pitonstream. If I have to do significant edits to a simple function that is 5 lines of C code long, it's unlikely that I can use this method on a 100 line function and have it work.

Let me know your thoughts, I appreciate it,
-Devon

jbalkind

unread,
Sep 8, 2017, 4:22:16 PM9/8/17
to OpenPiton Discussion
Hi Devon,

Here's my 2 cents:

Try using pitonunimap and add your C test to our infrastructure. To do that, make an accompanying .s file that has a C declaration in it (see factorial.s and factorial.c for an idea of how that works). Then, you can just run yourtest.s and the infrastructure will call the C compiler for you and then treat it like an assembly test. This requires that you set the PITON_GCC environment variable in piton_settings.bash. What pitonunimap does is generate a new storage_addr_trans_unified.v which has the correct memory mapping. The existing version of the file supports a mapping that's broad enough to cover a chunk of our assembly tests, but not everything (I think we've only seen it work for maybe 100 of our common tests). That'll be why you see the failures on some loads or stores.

Note that the storage_addr_trans_unified.v will be compiled as part of the verilog so there's a chance it won't work for your other C code. If you get an idea of how things are laid out by reading the storage_addr_trans_unified.v then you might be able to manually expand it to cover more tests. This is still a little unwieldy but probably better than what you've got now.

As for your opcode extension: I'd suggest adding a splitter and just memory mapping the component. You could alternatively make use of some of the ASI space we use for configuring execution drafting (see piton/design/chip/tile/sparc/rtl/cfg_asi.v). This requires the use of load/store alternate instructions.

If you decide you want to run on linux you'll probably need to make use of hypercalls for reading and writing the new memory you're adding. If you find that what you're doing works, it might not be worth trying linux.

I hope this might help. Please ask if you have any other questions!
Jon

Alexey Lavrov

unread,
Sep 8, 2017, 9:18:50 PM9/8/17
to OpenPiton Discussion
Hi Devon,

Jon made a very good point! 
You need to use pitonunimap to generate a new mapping and reimplement the project.
I used the steps below to run factorial.c successfully on genesys2 board.
The only problem for now is that libc doesn't implement printf or puts. However, it is solvable.

1) Go to $MODEL_DIR
2) run pitonunimap -f list.txt -b genesys2, where list.txt has line factorial.s. If you want mapping to support more tests, add them each on a separate line
3) in directory $DV_ROOT/design/chipset/rtl copy storage_addr_trans.v to storage_addr_trans_unified.v and also change module name from storage_addr_trans to storage_addr_trans_unified.
4) run protosyn -b genesys2 -d system --uart-dmw ddr and wait for it to finish
5) set PITON_GCC environment variable to point to sparc64-linux-gnu-gcc
6) program a board with generated .bit file
7) run pitonstream -b genesys2 -f list.txt

Test should pass without any modifications for an intermediate assembly file.
However, as I mentioned, all printf statements will be ignored.

Let us know if it works for you.

Alexey

Devon White

unread,
Sep 20, 2017, 6:53:29 PM9/20/17
to OpenPiton Discussion
Hi again,
So what you suggested appears to be working fine with our very basic c tests, so now we are trying to implement a basic multi-threaded test to make sure that the design is running both cores. Unfortunately, we get a linker fail running pitonunimap which reads:
midas: g_ld -b elf64-sparc -no-warn-mismatch --no-check-sections -T diag.ld_scr -o diag.exe
multi_core.o: In function `main':
multi_core.o(.text.startup+0x30): undefined reference to `pthread_create'
multi_core.o(.text.startup+0x50): undefined reference to `pthread_join'
multi_core.o(.text.startup+0x120): undefined reference to `__stack_chk_fail'
midas: At pkg=Midas::Interface, file=/home/cmuresearch/Research/sadpiton/piton/tools/perlmod/Midas/3.30/lib/site_perl/5.8.0/Midas/Interface.pm, line=370
midas: FATAL ERROR: M_LINKFAIL (#18): Linker failed.
midas: FATAL ERROR: Command "g_ld -b elf64-sparc -no-warn-mismatch --no-check-sections -T diag.ld_scr -o diag.exe" failed with status 1.

This corresponds to the c code compiled with:
sparc64-linux-gnu-gcc -m64 -fno-common -I. -I.. -I/home/cmuresearch/Research/sadpiton/piton/verif/diag/c/include -pthread -O2 -S multi_core.c -o multi_core.s

I am wondering if you have any ideas as to what would be causing this error. Thanks again for the help,
-Devon

Jonathan Balkind

unread,
Sep 21, 2017, 12:11:12 AM9/21/17
to OpenPiton Discussion

Hi Devon,

 

First: Are you compiling your C to assembly, then using that assembly as your .s? I ask as I'm a little confused by the compilation command you showed. I believe you should be able to put your C code directly into our infrastructure, following what factorial.c and factorial.s look like (factorial.s includes directives on how to compile factorial.c and what to link to it). Sims would then call the C compiler for you.

 

On your compilation error in our system: There is no real libc with what we provide, hence the undefined references. If you have a libc.a, you can include it using the MIDAS_LIB directive in your .s file. However, I'm not sure whether thread-local storage is going to work correctly and I don't know how/if that will affect pthreads. I don't even know that targeting sparc64-linux-gnu is necessarily what you want because we don't have any syscalls implemented. I would assume that something like pthread_create() will try to perform some syscalls. Perhaps you can implement something for threading yourself. It's really unfortunate that Sun didn't release the code for this as it's referenced in one of the other files.

 

I'm wondering whether it would be better to try to create an ELF file that could then be run on top of the hypervisor on FPGA. An example of this is shown in one of the original OpenSPARC videos (https://youtu.be/ZCX03bU8TSM?t=76). I'm not totally sure what the path is in this direction though as we haven't done anything at all with this.

 

Sorry this is so littered with uncertainty. You're at a point about as far as I got with playing with running C using the simulation infrastructure so I'm operating more on speculation here.

 

Jon

--
You received this message because you are subscribed to the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpiton+...@googlegroups.com.
To post to this group, send email to open...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openpiton/f2c157d4-e40e-47e3-a8b1-caba1114c574%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Devon White

unread,
Sep 21, 2017, 4:20:19 PM9/21/17
to OpenPiton Discussion
We are using the same structure as factorial.s in order to run the C code. So, we make an assembly file with:

#include "c/template_mt.s"
MIDAS_CC FILE=cmu/multi_core.c ARGS= -O2 -S

We added a -pthread flag to support pthreads which is causing the failure.

We really need to be able to run multicore C code. Are there any examples of this?

We are also running on emulation, not simulation.

Are there any assembly tests that you have that run on multiple cores?

Thanks,
Chris

Jonathan Balkind

unread,
Sep 25, 2017, 5:05:11 PM9/25/17
to OpenPiton Discussion

Hi Devon, Chris,

 

Apologies, I misunderstood from what you were saying. I see now how it works within our infrastructure.

 

We don't have any examples of multicore C code but there are many examples within our assembly code. The tile1 suite of tests contains many that use multiple hardware threads and every test in tile2, tile4, tile8, etc uses multiple cores. I think you should be able to look at how those tests start other threads, synchronise, and do other operations, and create C wrappers for those. We haven't had any need to do this ourselves, though. It's certainly possible that you will be able to massage a libc to work in the same setting too.

 

 

Please let us know how we can help further,

Devon White

unread,
Oct 3, 2017, 4:18:06 PM10/3/17
to OpenPiton Discussion
Hi,
So we're trying to implement C print in our own function based on the uart prints in uart-hi-piton.s. Our assembly code looks like this and is placed in libc.s:
    .align 4
    .global uart_char_print
uart_char_print:
    setx ADDR1, %l6, %l5
    stb %o0, [%l5]
    ret
    nop

The header in libc.h is:
extern void uart_char_print(char letter);

This is failing upon being called and we do not know why. Any help is appreciated.
-Devon

Alexey Lavrov

unread,
Oct 5, 2017, 1:48:05 PM10/5/17
to OpenPiton Discussion
Hi Devon,

It might be because your are in user mode and trying to access memory which you are not allowed to do.
Can you provide us with TT and TPC, which should be printed in the terminal?

Alexey

Devon White

unread,
Oct 5, 2017, 2:40:29 PM10/5/17
to OpenPiton Discussion
Hi,
So, they don't appear to be printed to the the terminal
We run the following:

pitonstream -b vc707 -f list.txt

List.txt includes the test that runs the print command and only that print command.

This is the asm generated from the C:

    .file    "test_print.c"
    .section    .text.startup,"ax",@progbits
    .align 4
    .global main
    .type    main, #function
    .proc    04
main:
    save    %sp, -176, %sp
    mov    68, %o0
    call    uart_char_print, 0
    mov    0, %i0
    call    pass, 0
    nop
    return    %i7+8
    nop
    .size    main, .-main
    .ident    "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbitsrminal.

Which is generated from the C code below:

#include "libc.h"
int main() {
  uart_char_print('D');
  pass();
}

The only output we get is:

UART will be configured for 115200 baud rate
Press reset button on FPGA
Waiting...

Configuration is complete

Running test_print.s: 1 out of 3 test
Checking correctness of section mapping...  Correct!
Used 7886 out of 16777216 blocks of storage
Loading a test...
100%
TEST OUTPUT >>>

<<< END OF TEST OUTPUT
test_print.s : FAILED

How would we enter a different mode with our C code?

Thanks,
Chris

Jonathan Balkind

unread,
Oct 6, 2017, 3:13:17 PM10/6/17
to OpenPiton Discussion

Hi Devon,

 

As Alexey said, I think the code is executed in user mode (unprivileged mode) so when you try to perform the store instruction, the test will fail as storing to I/O addresses requires you to be in hyperprivileged mode. I think you can get by this by making a hypercall trap that you would call instead. There are examples of how to do this in a number of our assembly tests. You could just add the code to htraps.s and then have your user-mode function invoke the appropriate trap number. I think Alexey has some sort of example of this in the code that prints TT and TPC but I'm not sure if that stuff is already executing in hyperprivileged mode. Alexey, could you clarify on this?

 

Jon

 

From: <open...@googlegroups.com> on behalf of Devon White <thedev...@gmail.com>
Date: Thursday, 5 October 2017 at 14:40
To: OpenPiton Discussion <open...@googlegroups.com>
Subject: Re: Compiling Open Sparc Assembly

 

Hi,

--

You received this message because you are subscribed to the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpiton+...@googlegroups.com.
To post to this group, send email to open...@googlegroups.com.

Devon White

unread,
Oct 24, 2017, 4:23:35 PM10/24/17
to OpenPiton Discussion
Hi,
We've successfully written an assembly uart char print that we can call, however, we can only call this function if it is in the same file as the routine that is calling it. This is problematic because we want to make a C wrapper for it so that we can use it in C. Do you know why calling a function in a separate file would compile successfully with gcc and midas but then fail at runtime?
-Devon

Devon White

unread,
Oct 29, 2017, 2:15:17 PM10/29/17
to OpenPiton Discussion
Alternatively if you could give us some documentation on the assembler we can probably figure it out. The problem appears to be that the assembler doesn't find the file that has the print definitions in it, and we don't know how to fix that.
-Devon

Jonathan Balkind

unread,
Oct 31, 2017, 2:30:48 PM10/31/17
to OpenPiton Discussion

Hi Devon,

 

There are a few directives you can make use of. Those are MIDAS_CC, MIDAS_OBJ, and MIDAS_LIB. I think what you want to do is just have multiple MIDAS_CC lines in your single .s file. If that isn't doing it for you, you can create your own static .a file, and you should be able to include that using MIDAS_LIB. If even that fails, then I think that MIDAS_OBJ will work with a precompiled .o file.

 

Please let me know if these approaches don't solve the problem.

 

Thanks,

Jon

 

From: <open...@googlegroups.com> on behalf of Devon White <thedev...@gmail.com>
Date: Sunday, 29 October 2017 at 14:15
To: OpenPiton Discussion <open...@googlegroups.com>
Subject: Re: Compiling Open Sparc Assembly

 

Alternatively if you could give us some documentation on the assembler we can probably figure it out. The problem appears to be that the assembler doesn't find the file that has the print definitions in it, and we don't know how to fix that.
-Devon

--

You received this message because you are subscribed to the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpiton+...@googlegroups.com.
To post to this group, send email to open...@googlegroups.com.

Devon White

unread,
Nov 9, 2017, 3:56:44 PM11/9/17
to OpenPiton Discussion
It ended up having to do with some of our assembly being odd. But something that we cannot seem to get working is running code on multiple cores.
We are using the th_fork function with two loops of different lengths. One will cause a pass, one causes a fail. It should be based on how long the loops are.
However, it is based on which ever loop is first in the code.

How do we show that we can run code on two cores?

Jonathan Balkind

unread,
Nov 9, 2017, 4:41:33 PM11/9/17
to Devon White, OpenPiton Discussion

Hi Devon,

 

Glad you got it to compile. When running multithreaded tests, I believe midas sets the -DCIOP flag, which means the first thread will wake up all others part way through its boot code. This means the other threads will be starting their execution behind the first. Have you tried writing a barrier at the start of the functions to make sure the threads are synchronised? You could also partially verify this by causing the first thread’s loop to run for much longer.

Devon White

unread,
Nov 10, 2017, 3:47:45 PM11/10/17
to OpenPiton Discussion
Hi again,
So I can see that midas is using the -DCIOP flag, but still only one thread is running. I am very confident that only one thread is running after writing several tests. What would be the reason that we only have one thread, even though we have synthesized with 2 cores?
-Devon

Jonathan Balkind

unread,
Nov 10, 2017, 3:50:41 PM11/10/17
to open...@googlegroups.com
You also need -midas_args=-DTHREAD_COUNT=2 and probably -midas_args=-DTHREAD_STRIDE=2 passed to sims. Do you have those? Katie has done this internally at some point and definitely had it run on multiple cores so it can be made to work

--
You received this message because you are subscribed to the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpiton+...@googlegroups.com.
To post to this group, send email to open...@googlegroups.com.

Katherine Lim

unread,
Nov 10, 2017, 4:26:56 PM11/10/17
to OpenPiton Discussion
Hi Devon,

To run multithreaded tests using pitonstream, you're going to have to make some changes to at least fpga_lib.py, which can be found in the piton/tools/src/proto/ directory. Like Jon mentioned above, you need to pass in -midas_args=-DTHREAD_COUNT=2 and -midas_args=-DTHREAD_STRIDE=2 when you run sims to run 2 cores with 1 thread per core. In addition, you need to pass -x-tiles=2 and -y-tiles=1 to sims as well. These changes can be made by changing the command in runMidas() in fpga_lib. Depending on how flexible you want to be in testing, you can also modify pitonunimap and pitonstream to allow these options to be controlled when you run pitonstream/pitonunimap from the command line.

Also, make sure to regenerate the memory map as Alexey described using pitonunimap, because adding these flags to sims will change the memory map.

-Katie

Devon White

unread,
Nov 13, 2017, 2:45:55 PM11/13/17
to OpenPiton Discussion
Hi,
Thanks for the help. Adding DTHREAD_COUNT and DTHREAD_STRIDE worked to get two threads, which is great. I tried adding -x-tiles=2 and -y=tiles=1 to sims (not in the midas_args), but this causes a compilation error. The error is "Unprocessed argument: -xtiles=2". Thanks,
-Devon

Devon White

unread,
Nov 13, 2017, 2:47:13 PM11/13/17
to OpenPiton Discussion
Sorry to add to this so quickly, but the error was actually "Unprocessed argument: -x-tiles=2", I was seeing if the argument was maybe misspelled.
-Devon

Devon White

unread,
Nov 13, 2017, 2:49:35 PM11/13/17
to OpenPiton Discussion
Nevermind, figured it out. It's "-x_tiles=2". Thanks again,
-Devon

Devon White

unread,
Nov 28, 2017, 2:51:44 PM11/28/17
to OpenPiton Discussion
We are running into issues with running mutex. Do you have any tests that use multiple cores and mutexs.

Katherine Lim

unread,
Nov 28, 2017, 11:18:01 PM11/28/17
to OpenPiton Discussion
Hi Devon,

There's a lock implementation that you can look at in piton_common.s. I pasted the relevant part below with some slight modification for use in unprivileged mode. You should just be able to call these routines from C code, and it should correspond to the functions lock_acquire(long long *) and lock_release(long long *).

lock_acquire:
    save     %sp, -96, %sp
    membar  #Sync
lock_loop:
    mov     1, %l0
    casx   [%i0], %g0, %l0
    cmp     %l0, 0
    bne     lock_loop
    nop
    membar  #Sync
    ret

 lock_release:
     save     %sp, -96, %sp
    membar  #Sync
    mov     1, %l0
    casx   [%i0], %l0, %g0
    membar  #Sync
     ret

-Katie

Devon White

unread,
Nov 30, 2017, 4:07:58 PM11/30/17
to OpenPiton Discussion
We have not found a piton_common.s
Also, the lock code you provided below hangs.
It will always find 0 at the lock mem location

Alexey Lavrov

unread,
Dec 1, 2017, 1:53:15 PM12/1/17
to OpenPiton Discussion
Hi Devon,

I would like to add couple of comments to Katherine's post.
First, we were using those mutexes in assembly programs, where we were running in Hyperpriveleged mode.
Second, locks were manually placed in data section mapped to hypervisor.
When using a compiler, it can happen that locks are placed in one data section and when you are trying to access them, address passed to a function is translated to something else.
I would suggest you run a simulation with those locks compiled in C program to be able to check why your code hangs.

Alexey
Reply all
Reply to author
Forward
0 new messages