Extremely long optimizer pass

492 views
Skip to first unread message

Andrey Sidorov

unread,
Jun 28, 2023, 8:02:53 PM6/28/23
to v8-dev
Hi,

Crossposting from node, issue https://github.com/nodejs/node/issues/48581

We have a CPU spike in a node process while no JS is being executed. The time is likely spent in an optimiser thread. 

Steps to reproduce: run first script in https://gist.github.com/sidorares/128160e6b3dea1da3ad45cd672651d2d#file-repro1-js and watch CPU at 100% for quite some time until the process exits.

Stacktrace:

* thread #5 * frame #0: 0x0000000101625d24 node`v8::internal::compiler::LoadElimination::AbstractField::Kill(v8::internal::compiler::LoadElimination::AliasStateInfo const&, v8::internal::MaybeHandle<v8::internal::Name>, v8::internal::Zone*) const + 68 frame #1: 0x00000001016287cb node`v8::internal::compiler::LoadElimination::AbstractState::KillFields(v8::internal::compiler::Node*, v8::internal::MaybeHandle<v8::internal::Name>, v8::internal::Zone*) const + 107 frame #2: 0x0000000101623484 node`v8::internal::compiler::LoadElimination::ReduceStoreField(v8::internal::compiler::Node*, v8::internal::compiler::FieldAccess const&) + 900 frame #3: 0x000000010155578a node`v8::internal::compiler::Reducer::Reduce(v8::internal::compiler::Node*, v8::internal::compiler::ObserveNodeManager*) + 26 frame #4: 0x00000001016931e9 node`v8::internal::compiler::(anonymous namespace)::SourcePositionWrapper::Reduce(v8::internal::compiler::Node*) + 57 frame #5: 0x00000001015565aa node`v8::internal::compiler::GraphReducer::Reduce(v8::internal::compiler::Node*) + 154 frame #6: 0x00000001015560f5 node`v8::internal::compiler::GraphReducer::ReduceTop() + 613 frame #7: 0x0000000101555c38 node`v8::internal::compiler::GraphReducer::ReduceNode(v8::internal::compiler::Node*) + 216 frame #8: 0x0000000101693dee node`v8::internal::compiler::LoadEliminationPhase::Run(v8::internal::compiler::PipelineData*, v8::internal::Zone*) + 718 frame #9: 0x000000010168501b node`auto v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::LoadEliminationPhase>() + 123 frame #10: 0x00000001016818f7 node`v8::internal::compiler::PipelineImpl::OptimizeGraph(v8::internal::compiler::Linkage*) + 455 frame #11: 0x00000001016814fe node`v8::internal::compiler::PipelineCompilationJob::ExecuteJobImpl(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) + 142 frame #12: 0x000000010034a01b node`v8::internal::OptimizedCompilationJob::ExecuteJob(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) + 43 frame #13: 0x00000001003778e3 node`v8::internal::OptimizingCompileDispatcher::CompileNext(v8::internal::TurbofanCompilationJob*, v8::internal::LocalIsolate*) + 35 frame #14: 0x0000000100378359 node`v8::internal::OptimizingCompileDispatcher::CompileTask::RunInternal() + 425 frame #15: 0x000000010015304a node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 362

Is this a known issue/bug?
Any hints on 1) how to reduce repro example even more 2) what is causing the issue 3) any flags in node to test with optimiser on/off? 

Thanks,
Andrey

Ben Noordhuis

unread,
Jun 29, 2023, 12:40:12 AM6/29/23
to v8-...@googlegroups.com
I forgot to mention it in the issue but the delay goes away when I
disable Turbofan with --noopt or --max-opt=2.

Tested V8 version is 11.3.244.8-node.9.

Sam Parker-Haynes

unread,
Aug 15, 2023, 6:32:29 AM8/15/23
to v8-dev
Just from looking at the stack trace, does --no-turbo-load-elimination resolve the issue? If so, I wonder if the pass could just abandon it's efforts in the presence of a large number of elements. I don't suppose you've tried reducing your case until the performance is acceptable? This would help gauge where the quadratic behaviour becomes too much. 

Thanks,
Sam

Daniel Lehmann

unread,
Aug 15, 2023, 7:04:01 AM8/15/23
to v8-...@googlegroups.com, Darius Mercadier

There were two phases in Turbofan that lead to long compile times:
1) The register allocator had some quadratic behavior in AssignSpillSlots, which is fixed on tip-of-tree (see the changes linked in the issue).
2) @Darius Mercadier improved load elimination as well. From a quick look, this seems to be the relevant CL, but he knows more details.

With tip-of-tree the repro1.js now compiles much quicker than before, e.g., the output of d8 --turbo-stats repro.js is:
----------------------------------------------------------------------------------------------------------------------
                Turbofan phase            Time (ms)                      Space (bytes)            Growth MOps/s Function
                                                                Total         Max.     Abs. max.
----------------------------------------------------------------------------------------------------------------------
[...]
              V8.TFLoadElimination    169.706 ( 7.7%)    76054904 ( 9.1%)   76007920  180465744                 next
[...]                                   -----------------------------------------------------------------------------------
             V8.TFAssignSpillSlots    145.702 ( 6.6%)    11519888 ( 1.4%)   11519856  178731536                 next
[...]
----------------------------------------------------------------------------------------------------------------------
                            totals   2201.396 (100.0%)   834915552 (100.0%)  274093216  274093216                 next


I.e., total compile time is now around 2s. The root cause is still a very large Turbofan graph due to inlining. Potential solutions are described in more detail in the issue above, long term Turboshaft might help with this.

Best,
Daniel

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/7dc6b882-1cb8-4921-ad01-418c38d20714n%40googlegroups.com.

Daniel Lehmann

Software Engineer

dleh...@google.com


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. 

     

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Reply all
Reply to author
Forward
0 new messages