We hope to release a new set of Google Workload Traces, Version 2, with additional information which should improve the accuracy of studies and analysis of these traces. Google takes its user privacy seriously and when in doubt, we take very conservative positions. Consequently, we are limited by what we can share. But within these constraints, we are potentially looking at the following list. We are soliciting input on this proposed information: is each item useful as listed? Are there tweaks that would make it more useful?
Below is the list of proposed items to possibly add to these traces. As a disclaimer, this list may need to be revised and some items may end up being infeasible to provide.
Instruction categories: For each instruction fetch, a set of instruction categories will be provided. These categories may include:
Integer operation
Floating-point operation
Vector operation
Logical operation
Load/store
Branch
“Complex” multi-sub-step operation
Barrier/synchronization
System call
A single instruction may combine multiple categories.
Operand dependencies: For each instruction fetch, the last N instructions on which it depends will be identified, where N is a low number such as 2.
Context switch dilation: the tracing overhead increases the context switch frequency per retired instruction, but this factor varies by workload. The factor will be included.
Suggested thread scheduling for same core configuration: A recommended software thread segment scheduling onto cores that undoes the context switch overhead while maintaining thread dependencies will be provided for use when simulating the same core configuration as in the traced environment.
Virtual-to-physical mapping: A suggested mapping of virtual to physical pages will be included. This may not be the actual mapping during tracing but a reasonable substitute.
Multi-tenant mixes: A suggested combination of workload traces to simulate together to study whole machine loads.
Are there any planned changes to the overall format of the traces, to assist in summarizing and randomly accessing them? I can see DynamoRIO has some recent changes that look like they break up the compression into independent chunks, will the version 2 traces use this.
I guess the question is about what sort of tooling you see as being useful with the traces, e.g.- high level visualization through some kind of UI, to assist in identifying regions of interest- slicing the trace to pick out a region of interest- generating statistics from a trace, perhaps sharding the processing over multiple threads (of the tool)
Is it planned to put the full raw instruction into any of the traces, or will it only ever be the category? I can see it might be tricky for x86, but it's more straightforward for RISCs.
Sticking with the category, it could be useful to make more distinctions in the floating-point category, e.g. single vs. double, divide/sqrt as a separate category, and a flag indicating FMA or other fused instructions - that could help with calculating things like Roofline, for example (which we do today with a DynamoRIO plugin).Atomics (both value-returning and non-value-returning) would also be useful as a category.
On Monday, September 26, 2022 at 10:07:10 PM UTC+1 Derek Bruening wrote:We hope to release a new set of Google Workload Traces, Version 2, with additional information which should improve the accuracy of studies and analysis of these traces. Google takes its user privacy seriously and when in doubt, we take very conservative positions. Consequently, we are limited by what we can share. But within these constraints, we are potentially looking at the following list. We are soliciting input on this proposed information: is each item useful as listed? Are there tweaks that would make it more useful?
Below is the list of proposed items to possibly add to these traces. As a disclaimer, this list may need to be revised and some items may end up being infeasible to provide.
Instruction categories: For each instruction fetch, a set of instruction categories will be provided. These categories may include:
Integer operation
Floating-point operation
Vector operation
Logical operation
Load/store
Branch
“Complex” multi-sub-step operation
Barrier/synchronization
System call
A single instruction may combine multiple categories.
Operand dependencies: For each instruction fetch, the last N instructions on which it depends will be identified, where N is a low number such as 2.
Context switch dilation: the tracing overhead increases the context switch frequency per retired instruction, but this factor varies by workload. The factor will be included.
Suggested thread scheduling for same core configuration: A recommended software thread segment scheduling onto cores that undoes the context switch overhead while maintaining thread dependencies will be provided for use when simulating the same core configuration as in the traced environment.
Virtual-to-physical mapping: A suggested mapping of virtual to physical pages will be included. This may not be the actual mapping during tracing but a reasonable substitute.
Multi-tenant mixes: A suggested combination of workload traces to simulate together to study whole machine loads.
--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/11d03449-99b0-46f6-948c-f165321c2ab2n%40googlegroups.com.