Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Timestamps of Google traces in combination with scheduler

45 views
Skip to first unread message

Linus Witschen

unread,
Nov 13, 2024, 12:27:17 PM11/13/24
to DynamoRIO Users
Hello,

I am currently analyzing the Google workload traces with the cache simulator. 
Here, I am interested in analyzing the LLC miss rate, and I have extended the cache simulator to also recognize timestamp markers and record the timestamped-LLC misses. As I want to keep the Google traces in order w.r.t. to their timestamps in the cache simulator (I do not want to simulate them on a different machine, but I want to understand how the application behaved in the data center), I am using a single CPU core (i.e., using the parameter '-cores 1'). This resolved my initial issue of seeing timestamps for LLC misses being out-of-order (i.e., younger timestamps were seen before older timestamps).

Initially, I was using 100 traces and I used the first and the last timestamp to compute an execution time for the application. The execution time here was 3.1 seconds.
When I extended the experiment to 600 traces (the initial 100 traces are part of that set), I got an execution time of 2.8 seconds using the timestamps. That is, adding more traces lead to a reduced execution time of 300ms, which should not be possible. I then found that the scheduler is 'updating' the timestamps.

I now have multiple questions:
  1. Why are the original timestamps modified?
  2. If I want to use the original timestamps, can I simply do that to perform a time-based analysis (an approximated one is fine, too)?
  3. Is my approach of using a single CPU core in the cache simulator reasonable to enforce an in-order execution w.r.t. the timestamps?

I just started using DynamoRIO, so feel free to point out any horrific mistakes I have made.

Thanks and best regards,
Linus

Derek Bruening

unread,
Nov 13, 2024, 12:41:50 PM11/13/24
to Linus Witschen, DynamoRIO Users
Please see the documentation on As-Traced Schedule Limitations and Dynamic Scheduling which explains that the schedule recorded during tracing (the timestamps and CPU assignments and implied context switches and migrations) is not representative due to tracing overhead.  Thus, we do not recommend using that as-traced schedule.  The scheduler does support replaying it for a trace that has the as-traced schedule in the cpu_schedule.bin.zip file, which is generated when the trace is created.  We did not provide that file for the Google Workload Traces; but as just noted we don't recommend using that in any case.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dynamorio-users/2cb5d952-ec29-44e6-b6ce-3958b5a5edc7n%40googlegroups.com.

Linus Witschen

unread,
Nov 13, 2024, 1:07:33 PM11/13/24
to DynamoRIO Users
Hello Derek,

Thanks for the quick reply!

As far as I understand the documentation, you would not recommend using either technique (as-traced or dynamic scheduling) for a time analysis. Is that correct? Are you aware of any alternatives that I could try for better timing? 

Would analyzing the Google traces in this direction be any useful or are the timing information too far off to get anything meaningful out of it? If I would be operating in timescales of seconds rather than microseconds or milliseconds, would the schedule still be non-representative (say, I want to analyze events within a time interval of a couple of seconds)?

Thanks and best regards,
Linus


Derek Bruening

unread,
Nov 19, 2024, 12:05:11 PM11/19/24
to Linus Witschen, DynamoRIO Users
General interval analysis over time has direct support in our analyzer infrastructure: https://dynamorio.org/sec_drcachesim_newtool.html "-interval_microseconds".  But if an analysis depends on scheduling onto cores we suggest using a dynamic schedule.  For the Google Workload Traces in particular, a version 2 release is imminent which contains further information including better re-scheduling fidelity.

Reply all
Reply to author
Forward
0 new messages