BigInt performance // unaligned access

89 views
Skip to first unread message

Alexey Pavlyutkin

unread,
Feb 15, 2024, 1:56:50 AM2/15/24
to v8-dev
Hi!

Currently I'm working on V8 performance for RISC-V and BigInt seems as one of the worst bottlenecks due to countless memcpy() calls for handling unaligned memory access. Today most of RISC-V hardware does not have vector extension so memcpy() is implemented as byte-by-byte copying and generates ~30% overhead on BigInt microbenches.

AFAIK support of kDoubleAlignment is also far from done, right?

My question is about dynamic padding, is there a reason why we cannot allocate kTaggedSize of extra space per BigInt instance and make digit array floating to meet ALIGNED condition? Additional integer & and + operations for calculating of nearest ALIGNED position still look much more efficient than memcpy() even with vectors.

Thank you

Regards,
Alex

Jakob Kummerow

unread,
Feb 15, 2024, 5:00:22 AM2/15/24
to v8-...@googlegroups.com
Hi!

Your understanding is correct that double-aligned allocations probably won't be available any time soon.

I don't think dynamic padding can solve the problem, because (1) objects can be moved by the GC, so you'd have to add or remove the padding during these events, which adds complexity (the object moving logic would have to check the type of the object; and objects could change their size as a result of being moved!) and reduces performance; and (2) if you had some BigInts with padding and some without, then every access to a digit would have to adjust its offset accordingly, which is either slow (when it branches) or costs extra memory (when the BigInt includes another field that contains the offset adjustment that can be added unconditionally -- and you'd still pay the price of that addition).

The idea behind the current design is that compilers translate memcpy(&local_var, some_addr, 8) to a very efficient sequence of machine instructions. For example, on x64, compilers know that they can emit a single 8-byte mov instruction for that. If that's not happening on RISC-V (yet?), there are a couple of options you could explore:
(1) Patch LLVM's RISC-V backend to emit a more efficient instruction sequence for memcpy with fixed small size.
(2) Provide #ifdef'ed RISC-V specific replacements for the four places where <v8>/src/bigint/bigint.h uses memcpy to read or write digits (e.g. manually perform two 4-byte reads/writes). My gut feeling is that this is the easiest approach, but I don't know what operations are conveniently available on RISC-V, so maybe this is harder than it sounds.
(3) Again behind an #ifdef, change the definition of digit_t on RISC-V to be uint32_t instead of uintptr_t. (You'll need a corresponding change in <v8>/src/objects/bigint.h, and adjust the digits reading/writing logic to use an updated condition for choosing the non-memcpy path, and you'll probably also need to update a couple more places where we generate code that reads/writes BigInt digits.) Note that if RISC-V can otherwise use 64-bit integer arithmetic, then this option will likely be slower than (2), because it means all operations that process digits will operate on 4-byte chunks.

Hope this helps,
Jakob


--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/ef279ddb-13d4-4346-9db7-20fe069dfa62n%40googlegroups.com.

Alexey Pavlyutkin

unread,
Feb 16, 2024, 1:28:42 AM2/16/24
to v8-dev
Hi!

Thank you

The idea was to adjust address of digit array with simple rule

adjusted_address = address + (address & kTaggedSize)

that looks very chip even for architectures with fast unaligned access and all other checks are to be compile-time, but you're absolutely right: GC breaks it apart if there is not a chip way to move objects not raw blocks.

BTW I am a new to V8 community and have sumbitted my very first patch to Gerrit two days ago

https://chromium-review.googlesource.com/c/v8/v8/+/5293559

I'm not sure about my understanding of the process. Should I do anything else to push my changes?

Thank you

Cheers,
Alex

Reply all
Reply to author
Forward
0 new messages