Hi!
Your understanding is correct that double-aligned allocations probably won't be available any time soon.
I don't think dynamic padding can solve the problem, because (1) objects can be moved by the GC, so you'd have to add or remove the padding during these events, which adds complexity (the object moving logic would have to check the type of the object; and objects could change their size as a result of being moved!) and reduces performance; and (2) if you had some BigInts with padding and some without, then every access to a digit would have to adjust its offset accordingly, which is either slow (when it branches) or costs extra memory (when the BigInt includes another field that contains the offset adjustment that can be added unconditionally -- and you'd still pay the price of that addition).
The idea behind the current design is that compilers translate memcpy(&local_var, some_addr, 8) to a very efficient sequence of machine instructions. For example, on x64, compilers know that they can emit a single 8-byte mov instruction for that. If that's not happening on RISC-V (yet?), there are a couple of options you could explore:
(1) Patch LLVM's RISC-V backend to emit a more efficient instruction sequence for memcpy with fixed small size.
(2) Provide #ifdef'ed RISC-V specific replacements for the four places where <v8>/src/bigint/bigint.h uses memcpy to read or write digits (e.g. manually perform two 4-byte reads/writes). My gut feeling is that this is the easiest approach, but I don't know what operations are conveniently available on RISC-V, so maybe this is harder than it sounds.
(3) Again behind an #ifdef, change the definition of digit_t on RISC-V to be uint32_t instead of uintptr_t. (You'll need a corresponding change in <v8>/src/objects/bigint.h, and adjust the digits reading/writing logic to use an updated condition for choosing the non-memcpy path, and you'll probably also need to update a couple more places where we generate code that reads/writes BigInt digits.) Note that if RISC-V can otherwise use 64-bit integer arithmetic, then this option will likely be slower than (2), because it means all operations that process digits will operate on 4-byte chunks.
Hope this helps,
Jakob