questions regarding v8's riscv and arm simulator

Qiaowen Yang

unread,

Nov 29, 2023, 3:04:28 AM11/29/23

to v8-users

Hi,

I'm using V8's simulator to run riscv and arm. My question is why the values of icount_ are so different between riscv simulator and arm simulator.

One of my target benchmarks is navier-strokes.js. The code size of this bench in riscv is 1680 and in arm is 1388, but the icount_ I collect from riscv and arm's simulators are 20942139 and 4421054. Therefore, it seems that riscv simulator somehow executes much more instructions than arm, and I don't know why.

Besides, in pipeline.cc, there are 3 phases, namely PrepareJob, ExecuteJob and FinalizeJob. I find that after the FinalizeJob phase, which should be the exit of V8, riscv simulator doesn't exit, but runs again PrepareJob and ExecuteJob phases. I'm guessing maybe that can explain why the icount_ of riscv simulator is much bigger, but still wondering why this is happening.

Thanks for any instructions!

Regards,

Qiaowen Yang

Yahan Lu

unread,

Nov 29, 2023, 10:36:00 AM11/29/23

to v8-users

Hi

Can you give me your test code navier-strokes.js?

Qiaowen Yang

unread,

Dec 4, 2023, 3:59:48 AM12/4/23

to v8-users

Hi,

Sorry for the late response. I found this test code in https://chromium.googlesource.com/external/github.com/WebKit/webkit/+/refs/tags/Safari-612.1.27.3.3/PerformanceTests/JetStream2/Octane/navier-stokes.js.

Yahan Lu

unread,

Dec 7, 2023, 4:44:32 AM12/7/23

to v8-users

I run navier-strokes.js on commit id c39bb5225d771e130468d5d75a9988944950e95e

My result

riscv64 : 19061085

arm64: 15599557

Riscv64 run more 22% instrs than arm64.

I think it meets expectation.

Nowadays riscv64 on v8 using isa rv64gc, that don't has B extension.

Riscv64 need more instr to implement some macro instr, such as:

Push a1:

add sp, sp, -8

sd a1, (0)sp

Zero extend:

srli a0, a0, 32

slli a0, a0, 32

load a ptr:

    lui(rd, (int32_t)high_20);
    addi(rd, rd, low_12);  // 31 bits in rd.
    slli(rd, rd, 11);      // Space for next 11 bis
    ori(rd, rd, b11);      // 11 bits are put in. 42 bit in rd
    slli(rd, rd, 6);       // Space for next 6 bits
    ori(rd, rd, a6);       // 6 bits are put in. 48 bis in rd

My test patch:

luyahan@plct-c7:~/v8/v8/out/arm64.release$ git diff

diff --git a/src/execution/arm64/simulator-arm64.cc b/src/execution/arm64/simulator-arm64.cc

index ec1cfd9f8eb..327f282bacc 100644

--- a/src/execution/arm64/simulator-arm64.cc

+++ b/src/execution/arm64/simulator-arm64.cc

@@ -425,9 +425,13 @@ void Simulator::Run() {

if (v8_flags.stop_sim_at == 0) {

// Fast version of the dispatch loop without checking whether the simulator

// should be stopping at a particular executed instruction.

+ std::cout << "icount " << icount_for_stop_sim_at_ << std::endl;

while (pc_ != kEndOfSimAddress) {

ExecuteInstruction();

+ icount_for_stop_sim_at_ =

+ base::AddWithWraparound(icount_for_stop_sim_at_, 1);

}

+ std::cout << "icount " << icount_for_stop_sim_at_ << std::endl;

} else {

// v8_flags.stop_sim_at is at the non-default value. Stop in the debugger

// when we reach the particular instruction count.

diff --git a/src/execution/riscv/simulator-riscv.cc b/src/execution/riscv/simulator-riscv.cc

index c01652c5ef2..04550a08013 100644

--- a/src/execution/riscv/simulator-riscv.cc

+++ b/src/execution/riscv/simulator-riscv.cc

@@ -7933,6 +7933,7 @@ void Simulator::Execute() {

// Get the PC to simulate. Cannot use the accessor here as we need the

// raw PC value and not the one used as input to arithmetic instructions.

sreg_t program_counter = get_pc();

+ std::cout << "icount " << icount_ << std::endl;

while (program_counter != end_sim_pc) {

Instruction* instr = reinterpret_cast<Instruction*>(program_counter);

icount_++;

@@ -7945,6 +7946,7 @@ void Simulator::Execute() {

CheckBreakpoints();

program_counter = get_pc();

}

+ std::cout << "icount " << icount_ << std::endl;

}

void Simulator::CallInternal(Address entry) {

luyahan@plct-c7:~/v8/v8/out/arm64.release$

Qiaowen Yang

unread,

Dec 8, 2023, 3:02:17 AM12/8/23

to v8-users

Hi,

Thanks for your work!

I have one more question. Although the icount of rv64 is much larger than arm64, when running sunspider bench using v8's csuite.py, I found that the execution time of rv64 is averagely less than arm64. I'm also curious about why this is happening. FYI, my experiments are conducted on commit id 1cfed53b01c63251f915e419deb231e26ae9911c.

Besides, for your explanation on icount, may I assume that if I add B extension to both v8's rv64 compilation pipeline and to v8's rv64 simulator, the gap between rv64 and arm64 will be reduced?

Thanks again for your detailed explanation.

Best,

Qiaowen

Reply all

Reply to author

Forward