Help with analysing JIT perf report

8 views
Skip to first unread message

Max Lapshin

unread,
Jun 23, 2021, 4:39:41 AMJun 23
to Erlang-Questions Questions
Perf report shows me things that I cannot analyse:


+ 63.85% 51.25% streamer beam.lcnt.smp [.] erts_schedule
+ 59.99% 0.02% streamer beam.lcnt.smp [.]
erts_dirty_process_main
+ 29.50% 0.04% streamer jitted-881027-8664.so [.]
gen_server:handle_msg/6
+ 25.66% 0.04% streamer jitted-881027-34178.so [.]
mpts_reader:handle_info/2


mpts_reader is my code and thanks for pointing out to me that it seems
to be the CPU eater.

I cannot understand what means that erts_schedule takes 63 / 51% of time.
Is erts_schedule a container to all functions below?

Max Lapshin

unread,
Jun 23, 2021, 5:00:52 AMJun 23
to Erlang-Questions Questions
I've clicked 'a' on this erts_schedule and see following:


Percent│ mov erts_msacc_enabled,%eax
│ movq $0x0,0x20(%rsp)
│ test %eax,%eax
│ ↓ je 7e3
│ mov erts_msacc_key,%edi
│ → callq ethr_tsd_get
│ mov %rax,0x20(%rsp)
│ 7e3: test %ebx,%ebx
│ movl $0x1,0x18(%rsp)
│ ↓ jne 1702
│ 7f3: mov 0xb0(%r14),%eax
│ test %eax,%eax
│ ↓ je 1660
│ movabs $0x7fffffffffffffff,%r13
│ xor %ebx,%ebx
│ 80e: mov $0x64,%edx
│ ↓ jmp 826
4.73 │ 815: subl $0x1,0x2c(%rsp)
1.64 │ mov 0x2c(%rsp),%eax
1.31 │ ┌──test %eax,%eax
│ ├──jle 15a0
0.03 │ 826:│ mov 0x18(%rbp),%r12d
0.01 │ │ mov %r12d,%eax
│ │ and $0x9,%eax
0.67 │ │ cmp $0x9,%eax
│ │↓ jne 15a0
86.62 │ │ pause
0.06 │ │ sub $0x1,%edx
3.92 │ │↑ jne 815
0.16 │ │→ callq sched_yield@plt
0.00 │ │ test %eax,%eax
│ │ mov $0x64,%edx




82% on some instruction pause

I will try to run this again on non-lcnt beam, but seems that this is
somewhere around spinlock?

Lukas Larsson

unread,
Jun 23, 2021, 5:19:22 AMJun 23
to Max Lapshin, Erlang-Questions Questions
Hello!

erts_schedule is the function that is responsible to selecting which thing to run next (port, process, etc). When trying to figure out what it should run it takes a bunch of locks, some of them are spinlocks. You should not really be spending all that much time in this function, so something is fishy.

It could be that the lcnt emulator has a bug/feature that could explain this, so (as you say) try running without it and see if you get the same problem.
It could be that lcnt+perf has some bug.
Or it could be something completely different that I can't think of.

Lukas

Jaka Bac

unread,
Jun 23, 2021, 5:23:39 AMJun 23
to Max Lapshin, Erlang-Questions Questions
Hi

PAUSE instruction is probably part of the spinlock. This would imply that a spinlock is busy-waiting a lot

Jaka

Max Lapshin

unread,
Jun 23, 2021, 5:39:31 AMJun 23
to Lukas Larsson, Erlang-Questions Questions
I've checked it without lcnt and situation is the same.

I have NIF ERL_NIF_DIRTY_JOB_IO_BOUND that uses enif_select to read
data with function read from DVB card.

I've seen that this pause instruction is somewhere nearby driver_entry@....

I've checked: one process that reads data from DVB card gets about 350
ready_input messages per second.

Maybe I don't need to use ERL_NIF_DIRTY_JOB_IO_BOUND flag if I know
that socket is ready for reading and non-blocked?

Lukas Larsson

unread,
Jun 23, 2021, 5:55:32 AMJun 23
to Max Lapshin, Erlang-Questions Questions
On Wed, Jun 23, 2021 at 11:39 AM Max Lapshin <max.l...@gmail.com> wrote:
We do not use dirty schedulers for reading from sockets, so if DVB cards behave similarly to sockets then you can skip using a dirty scheduler.

Rapidly scheduling processes in between normal and dirty schedulers could cause a lot of time to be spent in erts_schedule.

Lukas

Max Lapshin

unread,
Jun 23, 2021, 5:58:49 AMJun 23
to Lukas Larsson, Erlang-Questions Questions
This is exactly what I have done wrong =)

I've put read function into nif call with ERL_NIF_DIRTY_JOB_IO_BOUND
flag and it has ruined whole performance.

Now I have disabled it and performance now is normal. My functions
are in perf top now.

Thank you for help and thanks for all the tools, they are wonderful!
Reply all
Reply to author
Forward
0 new messages