memset implementation

227 views
Skip to first unread message

Alexander von Gluck

unread,
Jan 18, 2022, 7:53:08 PM1/18/22
to sw-...@groups.riscv.org
We have been having some loop issues with our memset implementation (some internal compiler memset vs our os memset)

If someone has a moment, could you take a look at our memset function and see if it makes sense?  My risc-v assembly isn't super strong :-)

00000000802382ea <memset>:
    802382ea:   7139                    addi    sp,sp,-64
    802382ec:   fc22                    sd      s0,56(sp)
    802382ee:   0080                    addi    s0,sp,64
    802382f0:   fca43c23                sd      a0,-40(s0)
    802382f4:   87ae                    mv      a5,a1
    802382f6:   fcc43423                sd      a2,-56(s0)
    802382fa:   fcf42a23                sw      a5,-44(s0)
    802382fe:   fd843783                ld      a5,-40(s0)
    80238302:   fef43423                sd      a5,-24(s0)
    80238306:   a829                    j       80238320 <memset+0x36>
    80238308:   fe843783                ld      a5,-24(s0)
    8023830c:   00178713                addi    a4,a5,1
    80238310:   fee43423                sd      a4,-24(s0)
    80238314:   fd442703                lw      a4,-44(s0)
    80238318:   0ff77713                zext.b  a4,a4
    8023831c:   00e78023                sb      a4,0(a5)
    80238320:   fc843783                ld      a5,-56(s0)
    80238324:   fff78713                addi    a4,a5,-1
    80238328:   fce43423                sd      a4,-56(s0)
    8023832c:   fff1                    bnez    a5,80238308 <memset+0x1e>
    8023832e:   fd843783                ld      a5,-40(s0)
    80238332:   853e                    mv      a0,a5
    80238334:   7462                    ld      s0,56(sp)
    80238336:   6121                    addi    sp,sp,64
    80238338:   8082                    ret


Thanks!

 -- Alex

Bruce Hoult

unread,
Jan 18, 2022, 8:31:37 PM1/18/22
to Alexander von Gluck, RISC-V SW Dev
What on EARTH is all that?

Did you write memset() in C and then compile it with -O0?

That looks approximately correct but OMG slow.

If you want a compact slow memset just write it in C and compile it with -O1 and it will be massively better.




--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAKAhwk_J2X4QzWTAX__DRQtLa5ch_YJPzvsrPK8rrhxrrY1OxA%40mail.gmail.com.

Tommy Murphy

unread,
Jan 18, 2022, 8:47:13 PM1/18/22
to Bruce Hoult, Alexander von Gluck, RISC-V SW Dev
> having some loop issues

What sort of issues exactly?

Alexander von Gluck

unread,
Jan 18, 2022, 9:00:00 PM1/18/22
to Tommy Murphy, Bruce Hoult, RISC-V SW Dev
Back story is we had Haiku's riscv64 images booting, but upgraded from gcc 8 to gcc 11.2 


As soon as we upgraded our toolchain,  we started seeing crashing during boot in qemu-system-riscv64 and the sifive unmatched.

https://dev.haiku-os.org/ticket/17468   has a lot of the details on the SiFive Unmatched (though it's mostly reproducible in qemu)
https://dev.haiku-os.org/ticket/17511  is the separate bug for qemu.

One of our contributors saw that there was an infinite recursion between our memset implementation and the glibc built-ins

0000000000114632 <memset>:
  114632: 01 11        	addi	sp, sp, -32
  114634: 22 e8        	sd	s0, 16(sp)
  114636: 26 e4        	sd	s1, 8(sp)
  114638: 06 ec        	sd	ra, 24(sp)
  11463a: 00 10        	addi	s0, sp, 32
  11463c: aa 84        	mv	s1, a0
  11463e: 19 c6        	beqz	a2, 0x11464c <memset+0x1a>
  114640: 93 f5 f5 0f  	andi	a1, a1, 255
  114644: 97 40 ff ff  	auipc	ra, 1048564
  114648: e7 80 c0 00  	jalr	12(ra)
  11464c: e2 60        	ld	ra, 24(sp) // <-- HERE
  11464e: 42 64        	ld	s0, 16(sp)
  114650: 26 85        	mv	a0, s1
  114652: a2 64        	ld	s1, 8(sp)
  114654: 05 61        	addi	sp, sp, 32
  114656: 82 80        	ret


Adding -fno-builtins was the proposed fix, but i've been having issues confirming it's sane. (and my test systems still crash after setting -fno-builtins)

Also, to Bruce's comments... we're using a generic memset c function on riscv64.  I've been building debug kernels for symbols and forgot it -O0's everything :-)
We have had a bunch of issues with gcc 11 optimizations across multiple architectures.. so that part of it is likely not riscv64 arch's fault.

 -- Alex

Jim Wilson

unread,
Jan 18, 2022, 10:05:19 PM1/18/22
to Alexander von Gluck, Tommy Murphy, Bruce Hoult, RISC-V SW Dev
On Tue, Jan 18, 2022 at 6:00 PM Alexander von Gluck <kalli...@gmail.com> wrote:
One of our contributors saw that there was an infinite recursion between our memset implementation and the glibc built-ins

-fno-tree-loop-distribute-patterns disables this optimization.  This is what glibc uses.  You can also use -fno-builtin but that will disable more optimizations.  You can also use -ffreestanding which disables even more optimizations.

Jim

Bruce Hoult

unread,
Jan 18, 2022, 10:10:39 PM1/18/22
to Alexander von Gluck, Tommy Murphy, RISC-V SW Dev
On Wed, Jan 19, 2022 at 2:59 PM Alexander von Gluck <kalli...@gmail.com> wrote:
One of our contributors saw that there was an infinite recursion between our memset implementation and the glibc built-ins

That's not in the code you initially showed, which was correctly self-contained.

Robert Lipe

unread,
Jan 19, 2022, 3:45:57 AM1/19/22
to RISC-V SW Dev, RISC-V SW Dev
On Tuesday, January 18, 2022 at 9:05:19 PM UTC-6 jim.wil...@gmail.com wrote:

-fno-tree-loop-distribute-patterns disables this optimization.  This is what glibc uses.  You can also use -fno-builtin but that will disable more optimizations.  You can also use -ffreestanding which disables even more optimizations.

In broad strokes, if you're building an OS kernel, you probably always want to build your entire kernel -ffreeestanding. You rarely have the luxury of a full libc in the kernel and you don't wan't GCC "helpfully" changing your call to your own tiny printf to an fputs, for example, as you'll be frustrated because you'll look at your code and not SEE a call to fputs.  Some kernels call their "printf" something else to avoid exactly this.

If you had an implementation of memset that was two loops (a word loop and a byte loop) you were building with optimization and without -ffreestanding, I think there was a case where the compiler would see the tail end and recognize "oh, I can make this a memset" (VoiceOver: while building memset) which would result in infinite recursion.  Jim's suggestion should stop that from happening.

I personally find reading -O1 code easier as it's closer to a straight mechanical translation of what you see on the screen to what you EXPECT to see.

I think that what you tested and what you showed us either wasn't the same code or lacked context.

 
Reply all
Reply to author
Forward
0 new messages