Tracing stops after shared memory buffer overrun

257 views
Skip to first unread message

Jaeheon Lee

unread,
Apr 6, 2024, 4:40:07 AM4/6/24
to Perfetto Development - www.perfetto.dev
Hi, 
I am trying to capture a long perfetto trace with this config file

buffers {
  size_kb: 32768
  fill_policy: DISCARD
}
buffers {
  size_kb: 32768
  fill_policy: DISCARD
}
data_sources {
  config {
    name: "linux.ftrace"
    target_buffer: 0
    ftrace_config {
      symbolize_ksyms: true
      atrace_categories: "res"
      atrace_categories: "workq"
      atrace_categories: "webview"
      atrace_categories: "memory"
      atrace_categories: "idle"
      atrace_categories: "dalvik"
      atrace_categories: "freq"
      atrace_categories: "am"
      atrace_categories: "sync"
      atrace_categories: "network"
      atrace_categories: "binder_driver"
      atrace_categories: "input"
      atrace_categories: "hal"
      atrace_categories: "disk"
      atrace_categories: "view"
      atrace_categories: "sched"
      atrace_categories: "wm"
      atrace_categories: "thermal"
      atrace_categories: "gfx"
      atrace_categories: "power"
      atrace_categories: "camera"
      atrace_categories: "aidl"
      atrace_categories: "memreclaim"
      atrace_apps: "*"
      compact_sched {
        enabled: true
      }
      #buffer_size_kb: 8192
      #drain_period_ms: 1000
      buffer_size_kb: 16384
      drain_period_ms: 250
    }
  }
}
data_sources {
  config {
    name: "android.gpu.memory"
    target_buffer: 0
  }
}
data_sources {
  config {
    name: "linux.process_stats"
    target_buffer: 1
    process_stats_config {
      proc_stats_poll_ms: 60000
    }
  }
}
data_sources {
  config {
    name: "android.power"
    target_buffer: 1
    android_power_config {
      battery_poll_ms: 1000
      collect_power_rails: true
      battery_counters: BATTERY_COUNTER_CAPACITY_PERCENT
      battery_counters: BATTERY_COUNTER_CHARGE
      battery_counters: BATTERY_COUNTER_CURRENT
    }
  }
}
data_sources {
  config {
    name: "android.sys_stats"
    target_buffer: 1
    sys_stats_config {
      vmstat_period_ms: 1000
    }
  }
}
data_sources {
  config {
    name: "android.surfaceflinger.frametimeline"
  }
}
data_sources {
  config {
    name: "android.hardware.camera"
    target_buffer: 1
  }
}
data_sources {
  config {
    name: "org.chromium.trace_event"
    chrome_config {
      trace_config: "{\"record_mode\":\"record-continuously\",\"included_categories\":[\"*\"]}"
    }
  }
}
data_sources {
  config {
    name: "org.chromium.trace_metadata"
    chrome_config {
      trace_config: "{\"record_mode\":\"record-continuously\",\"included_categories\":[\"*\"]}"
    }
  }
}
enable_extra_guardrails: false
statsd_metadata {
}
write_into_file: true
file_write_period_ms: 2500
flush_period_ms: 10000
notify_traceur: true
trace_uuid_msb: 771211190659565185
trace_uuid_lsb: -7906795730165107342


But when I run this with command cat android_default.pbtx | adb shell perfetto -c - --txt --background -o long on the phone and give some events, after I get this 
ory_arbiter_impl.cc:192 Shared memory buffer overrun! Stalling
information log on the Logcat, no tracing after that time is recorded into the trace. The trace itself shows the full timeline (2 mins), but the trace after that log has occurred (30s in), no trace is recorded, as shown in the image below.

perfetto_trace.png

1. Is there a problem with my trace config setup that is causing this error?
2. I assume that increasing my shmem size would be a workaround for this issue. would changing the kDefaultShmSize in  perfetto/include/perfetto/ext/tracing/core/tracing_service.h
be enough? If so, is there any other easier way to achieve this without having to rebuild perfetto?

Lalit Maganti

unread,
Apr 6, 2024, 4:41:52 AM4/6/24
to Jaeheon Lee, Perfetto Development - www.perfetto.dev
This is a harmless, rather spammy warning which does not require any action on your side

Really the only action is probably to reduce that to a debug log rather than an error log as seeing that log line is fully expected and normal. 

--
You received this message because you are subscribed to the Google Groups "Perfetto Development - www.perfetto.dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perfetto-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/22baf380-94a0-4fb2-bbd0-ea8679a67ec1n%40googlegroups.com.

Lalit Maganti

unread,
Apr 6, 2024, 4:44:33 AM4/6/24
to Jaeheon Lee, Perfetto Development - www.perfetto.dev
To answer your actual question on why tracing stops though, it's correlation with that warning not causation. That log line appears when there is heavy tracing activity on device and likely what is happening is that you are hitting the limits of your DISCARD buffer.

I'd strongly suggest switching to RING_BUFFER for long tracing and either increasing the buffer size or reduce the file write period to resolve the issue. 

Jaeheon Lee

unread,
Apr 6, 2024, 10:10:21 AM4/6/24
to Perfetto Development - www.perfetto.dev
Thanks for your reply.
I have tried your solutions, increasing the buffer size, using RING_BUFFER, and decreasing the file_write_period.

1. Using RING_BUFFER caused the about the first half of the trace to drop, which shows that the solution has not been solved
2. I have tried increasing the buffers size_kb to 102400,  51200, but both gave me these errors
04-06 21:39:35.279 10630 10630 E perfetto: ng_service_impl.cc:2379 writev() failed (errno: 28, No space left on device)
I kept getting these errors even when the free space was over 2000MB for mem, 3000MB for swap when I monitored them through top.
The traces would end with the error, and when I opened the trace, the problem described above kept happening. 

 I have tried decreasing the file_write_period_ms to 1000 and even 100, but did not change any results
I have also tried decreasing the flush_period_ms to 1000 from 10000, which did not change anything too. 

Also, I have tried using "Record Trace" button directly on the GUI, which has these configs
buffers { size_kb: 131072 fill_policy: RING_BUFFER } buffers { size_kb: 2048 fill_policy: RING_BUFFER } data_sources { config { name: "linux.ftrace" target_buffer: 0 ftrace_config { symbolize_ksyms: true atrace_categories: "res" atrace_categories: "webview" atrace_categories: "dalvik" atrace_categories: "am" atrace_categories: "network" atrace_categories: "input" atrace_categories: "hal" atrace_categories: "view" atrace_categories: "wm" atrace_categories: "gfx" atrace_categories: "power" atrace_categories: "camera" atrace_categories: "aidl" atrace_apps: "*" buffer_size_kb: 8192 drain_period_ms: 1000 } } } data_sources { config { name: "android.gpu.memory" target_buffer: 0 } } data_sources { config { name: "linux.process_stats" target_buffer: 1 } } data_sources { config { name: "android.power" target_buffer: 1 android_power_config { battery_poll_ms: 1000 collect_power_rails: true battery_counters: BATTERY_COUNTER_CAPACITY_PERCENT battery_counters: BATTERY_COUNTER_CHARGE battery_counters: BATTERY_COUNTER_CURRENT } } } data_sources { config { name: "android.surfaceflinger.frametimeline" } } data_sources { config { name: "android.hardware.camera" target_buffer: 1 } } data_sources { config { name: "org.chromium.trace_event" chrome_config { trace_config: "{\"record_mode\":\"record-continuously\",\"included_categories\":[\"*\"]}" } } } data_sources { config { name: "org.chromium.trace_metadata" chrome_config { trace_config: "{\"record_mode\":\"record-continuously\",\"included_categories\":[\"*\"]}" } } } enable_extra_guardrails: false statsd_metadata { } write_into_file: true file_write_period_ms: 604800000 flush_period_ms: 30000 notify_traceur: true incremental_state_config { clear_period_ms: 15000 } statsd_logging: STATSD_LOGGING_DISABLED which does not give any errors when I record long traces for over 2 minutes, but when I end the trace, the trace does not save properly after waiting for over an hour.
When I pulled the trace file an hour after the save has not finished, the file was only 137MB with only some traces being recorded

Lalit Maganti

unread,
Apr 6, 2024, 10:12:52 AM4/6/24
to Jaeheon Lee, Perfetto Development - www.perfetto.dev
04-06 21:39:35.279 10630 10630 E perfetto: ng_service_impl.cc:2379 writev() failed (errno: 28, No space left on device)

This error is about disk space not memory or swap. Seems like we're not able to write the trace to disk because there's no space remaining - I'd check that before doing any further debugging as any of the other things you've tried could be invalidated by lack of disk space. 

Jaeheon Lee

unread,
Apr 6, 2024, 11:55:40 AM4/6/24
to Perfetto Development - www.perfetto.dev
Thanks a lot for helping me!
That was indeed the problem, as I was saving the trace in a partition with only about a gigabyte left.
Sorry for asking such dumb questions, I will make sure to check those next time.
Thanks again.
Reply all
Reply to author
Forward
0 new messages