RFC: Google Workload Traces Version 2 Additional Information

98 views
Skip to first unread message

Derek Bruening

unread,
Sep 26, 2022, 5:07:10 PM9/26/22
to DynamoRIO Users

We hope to release a new set of Google Workload Traces, Version 2, with additional information which should improve the accuracy of studies and analysis of these traces.  Google takes its user privacy seriously and when in doubt, we take very conservative positions. Consequently, we are limited by what we can share. But within these constraints, we are potentially looking at the following list.  We are soliciting input on this proposed information: is each item useful as listed?  Are there tweaks that would make it more useful?


Below is the list of proposed items to possibly add to these traces.  As a disclaimer, this list may need to be revised and some items may end up being infeasible to provide.

  • Instruction categories: For each instruction fetch, a set of instruction categories will be provided.  These categories may include:

    • Integer operation

    • Floating-point operation

    • Vector operation

    • Logical operation

    • Load/store

    • Branch

    • “Complex” multi-sub-step operation

    • Barrier/synchronization

    • System call

A single instruction may combine multiple categories.

  • Operand dependencies: For each instruction fetch, the last N instructions on which it depends will be identified, where N is a low number such as 2.

  • Context switch dilation: the tracing overhead increases the context switch frequency per retired instruction, but this factor varies by workload.  The factor will be included.

  • Suggested thread scheduling for same core configuration: A recommended software thread segment scheduling onto cores that undoes the context switch overhead while maintaining thread dependencies will be provided for use when simulating the same core configuration as in the traced environment.

  • Virtual-to-physical mapping: A suggested mapping of virtual to physical pages will be included.  This may not be the actual mapping during tracing but a reasonable substitute.

  • Multi-tenant mixes: A suggested combination of workload traces to simulate together to study whole machine loads.


algra...@gmail.com

unread,
Sep 27, 2022, 7:53:29 AM9/27/22
to DynamoRIO Users
Are there any planned changes to the overall format of the traces, to assist in summarizing and randomly accessing them? I can see DynamoRIO has some recent changes that look like they break up the compression into independent chunks, will the version 2 traces use this. I guess the question is about what sort of tooling you see as being useful with the traces, e.g.

 - high level visualization through some kind of UI, to assist in identifying regions of interest
 - slicing the trace to pick out a region of interest 
 - generating statistics from a trace, perhaps sharding the processing over multiple threads (of the tool)

Is it planned to put the full raw instruction into any of the traces, or will it only ever be the category? I can see it might be tricky for x86, but it's more straightforward for RISCs. Sticking with the category, it could be useful to make more distinctions in the floating-point category, e.g. single vs. double, divide/sqrt as a separate category, and a flag indicating FMA or other fused instructions - that could help with calculating things like Roofline, for example (which we do today with a DynamoRIO plugin).

Atomics (both value-returning and non-value-returning) would also be useful as a category.

Derek Bruening

unread,
Sep 27, 2022, 10:33:07 AM9/27/22
to algra...@gmail.com, DynamoRIO Users
On Tue, Sep 27, 2022 at 7:53 AM algra...@gmail.com <algra...@gmail.com> wrote:
Are there any planned changes to the overall format of the traces, to assist in summarizing and randomly accessing them? I can see DynamoRIO has some recent changes that look like they break up the compression into independent chunks, will the version 2 traces use this.

Yes, the format is changing to break up each thread's file into chunks, as you said.  To your points about random access and intra-thread sharding, this is both for faster seeking and for intra-thread parallel processing, with all "once-only" data repeated in each chunk to facilitate this.  The new format is behind the reader abstraction barrier and will not disrupt any existing workflows.
 
I guess the question is about what sort of tooling you see as being useful with the traces, e.g.

 - high level visualization through some kind of UI, to assist in identifying regions of interest
 - slicing the trace to pick out a region of interest 
 - generating statistics from a trace, perhaps sharding the processing over multiple threads (of the tool)

Visualization tools could certainly be built.  For the slicing, the faster seeking should make it practical to jump to the start of each region of interest within the existing trace file, with no need to try to extract subsets of the data into new files.
The basic_counts tool computes simple statistics today in parallel over software threads; if could now be parallelized within a thread, as you say.
 
Is it planned to put the full raw instruction into any of the traces, or will it only ever be the category? I can see it might be tricky for x86, but it's more straightforward for RISCs.

It is unlikely the Google Workload Traces will be able to contain the instruction encodings, but there is a plan to support embedding all instruction encodings into the trace format in general for other traces, for Java support and to eliminate the need to have the binary around for analysis.
 
Sticking with the category, it could be useful to make more distinctions in the floating-point category, e.g. single vs. double, divide/sqrt as a separate category, and a flag indicating FMA or other fused instructions - that could help with calculating things like Roofline, for example (which we do today with a DynamoRIO plugin).

Atomics (both value-returning and non-value-returning) would also be useful as a category.

This is useful feedback for the categories: thank you.
 
On Monday, September 26, 2022 at 10:07:10 PM UTC+1 Derek Bruening wrote:

We hope to release a new set of Google Workload Traces, Version 2, with additional information which should improve the accuracy of studies and analysis of these traces.  Google takes its user privacy seriously and when in doubt, we take very conservative positions. Consequently, we are limited by what we can share. But within these constraints, we are potentially looking at the following list.  We are soliciting input on this proposed information: is each item useful as listed?  Are there tweaks that would make it more useful?


Below is the list of proposed items to possibly add to these traces.  As a disclaimer, this list may need to be revised and some items may end up being infeasible to provide.

  • Instruction categories: For each instruction fetch, a set of instruction categories will be provided.  These categories may include:

    • Integer operation

    • Floating-point operation

    • Vector operation

    • Logical operation

    • Load/store

    • Branch

    • “Complex” multi-sub-step operation

    • Barrier/synchronization

    • System call

A single instruction may combine multiple categories.

  • Operand dependencies: For each instruction fetch, the last N instructions on which it depends will be identified, where N is a low number such as 2.

  • Context switch dilation: the tracing overhead increases the context switch frequency per retired instruction, but this factor varies by workload.  The factor will be included.

  • Suggested thread scheduling for same core configuration: A recommended software thread segment scheduling onto cores that undoes the context switch overhead while maintaining thread dependencies will be provided for use when simulating the same core configuration as in the traced environment.

  • Virtual-to-physical mapping: A suggested mapping of virtual to physical pages will be included.  This may not be the actual mapping during tracing but a reasonable substitute.

  • Multi-tenant mixes: A suggested combination of workload traces to simulate together to study whole machine loads.


--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/11d03449-99b0-46f6-948c-f165321c2ab2n%40googlegroups.com.

algra...@gmail.com

unread,
Nov 15, 2022, 7:54:34 PM11/15/22
to DynamoRIO Users
Hi, is there any update on the new format? I was thinking that the category field might be useful for other classes of trace record, e.g. prefetches and cache maintenance operations. It could act as a modifier to the TYPE field. The overall record type would be cross-architecture but the details might be architecture-specific. It would avoid proliferating large numbers of TYPE values and allow easier extension.

Also is it planned for this to become the new format for drmemtrace in general (i.e. the new "canonical" format), and not just for Google Workload Traces?
Reply all
Reply to author
Forward
0 new messages