Announcing Google Workload Traces

370 views
Skip to first unread message

Derek Bruening

unread,
Apr 13, 2022, 3:15:42 PM4/13/22
to DynamoRIO Users
Google is sharing instruction and memory address traces from workloads running in Google data centers so that computer architecture researchers can study and develop new architecture ideas to improve the performance and efficiency of this important class of workloads.  The traces are shared in the same format used by the drmemtrace and drcachesim suite of trace gathering and analysis tools that are part of DynamoRIO.

See https://dynamorio.org/google_workload_traces.html for information on how to access these traces.

Zhibin Yu

unread,
May 7, 2022, 4:42:24 AM5/7/22
to DynamoRIO Users
Hi Derek,
   Can we open the xxx.memtrace file to take a look by using a text editor? 
Best regards,
Zhibin

Derek Bruening

unread,
May 8, 2022, 10:15:50 PM5/8/22
to Zhibin Yu, DynamoRIO Users
It is a binary format (described at https://dynamorio.org/sec_drcachesim_format.html).
The plan is to augment the view tool for traces without full instruction encodings (we cannot make those available at this time) to make it easy to get a quick human-readable snapshot: https://github.com/DynamoRIO/dynamorio/issues/5486.
The raw bits can be observed as described here, but programmatic access should use the analysis tool interface.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/fbfd948d-e930-41ea-9011-0917e8020382n%40googlegroups.com.

Wenqi

unread,
May 12, 2022, 1:21:00 AM5/12/22
to DynamoRIO Users
Hi Derek, 

Make sure I understand correctly, these traces only contain memory instructions' address and branch instructions' address and target. And we cannot reconstruct the execution flow of the workload like we do with a normal dynamoRIO offline trace, because the binary and linked libraries are not released (so we don't have the modules.log file)

Best,
Wenqi

Derek Bruening

unread,
May 12, 2022, 9:59:32 AM5/12/22
to Wenqi, DynamoRIO Users
Correct, these initial traces are best suited to front-end studies or various types of memory or cache analysis.

It may be possible to provide further information in a future version of the traces to allow additional uses.  We are interested in hearing about what types of minimal information would enable further analyses that you would like to pursue.
For example, would broad opcode classes (ALU, SIMD, etc.) plus some sort of virtual register operand information that allowed dependence analysis be enough to open up new types of experiments?

- Derek

Wenqi

unread,
May 12, 2022, 11:29:32 PM5/12/22
to DynamoRIO Users
Hi Derek, 

Thanks for the reply, we are mainly doing study in the memory system and would like see what end-to-end performance impact (like IPC, QPS etc.) would a specific design change from memory system has. Like you said, extra information on the category of the rest of the instructions and their dependance on the LD/ST value should be suffice for such analyses. 

Best, 
Wenqi

Derek Bruening

unread,
May 14, 2022, 12:01:40 PM5/14/22
to Zhibin Yu, DynamoRIO Users
On Sun, May 8, 2022 at 10:15 PM Derek Bruening <brue...@google.com> wrote:
It is a binary format (described at https://dynamorio.org/sec_drcachesim_format.html).
The plan is to augment the view tool for traces without full instruction encodings (we cannot make those available at this time) to make it easy to get a quick human-readable snapshot: https://github.com/DynamoRIO/dynamorio/issues/5486.

The augmented view tool is in this week's build: https://github.com/DynamoRIO/dynamorio/releases/tag/cronbuild-9.0.19124.
You can point it a directory and it will interleave the threads by timestamp, or at a single file:

$ bin64/drrun -t drcachesim -simulator_type view -infile merced/merced_trace-1_13378400607429273214.462774.memtrace.gz 2>&1 | head -16
Output format:
<record#>: T<tid> <record details>
------------------------------------------------------------
        1: T462774 <marker: version 3>
        2: T462774 <marker: filetype 0x40>
        3: T462774 <marker: cache line size 64>
        4: T462774 <marker: tid 462774 on core 73>
        5: T462774 <marker: timestamp 13282653165917715>
        6: T462774 ifetch 3 byte(s) @ 0x00007f6cae908a29 non-branch
        7: T462774 ifetch 6 byte(s) @ 0x00007f6cae908a2c non-branch
        8: T462774 ifetch 2 byte(s) @ 0x00007f6cae908a32 conditional jump
        9: T462774 ifetch 2 byte(s) @ 0x00007f6cae908a52 non-branch
       10: T462774 ifetch 7 byte(s) @ 0x00007f6cae908a54 non-branch
       11: T462774 read   8 byte(s) @ 0x00007f6cae90ca20 by PC 0x00007f6cae908a54
       12: T462774 ifetch 3 byte(s) @ 0x00007f6cae908a5b non-branch
       13: T462774 write  4 byte(s) @ 0x00007f6cad0387c0 by PC 0x00007f6cae908a5b

mahmo abdallah

unread,
May 5, 2023, 7:25:32 PM5/5/23
to DynamoRIO Users
Hello Derek,

I used the most recent release of DynamoRIO and try the above command to parse Google workload traces. I am running on Windows machine with MINGW64. However, I have the below error of invalid error:

`~/Downloads/DynamoRIO-Windows-9.93.19475/DynamoRIO-Windows-9.93.19475$ bin64/drrun -t drcachesim -simulator_type view -infile ./10058381926338669845.507190.memtrace.gz
Invalid header
Failed to initialize scheduler: Failed to open ./10058381926338669845.507190.memtrace.gz
ERROR: failed to initialize analyzer

`

I am able to parse with zcat test:

$ zcat /c/Users/mabdalla/AppData/Local/Google/Cloud\ SDK/delta/trace-1/100583819 $1, $2, $3, $7, $6, $5, $4}' | head -5
000000 | 0019 0000 0000000000000003
00000c | 001c 000c 0000000000000003
000018 | 001c 0009 0000000000000040
000024 | 0016 0004 000000000007bd36
000030 | 0018 0004 000000000007bd36

Could you also please point me out to the entry point in the source code where you parse these traces? Which file I should take a look at?

Thank you!

Derek Bruening

unread,
May 10, 2023, 3:56:58 PM5/10/23
to mahmo abdallah, DynamoRIO Users
Re: what file: there are multiple; searching for "Invalid header" and stepping to there in the debugger could be the simplest approach.
It may be that the Windows cronbuild release does not have zlib linked in and so cannot read compressed files (if you know of a simple way to install a zlib on Github Actions Windows and would like to add it to our workflow files that would be great).

Reply all
Reply to author
Forward
0 new messages