Instance_events table issue

25 views
Skip to first unread message

Shining Flag

unread,
Aug 4, 2025, 3:51:55 AMAug 4
to Google cluster data - discussions
Hello sir, I'm currently a PhD student from Singapore.
Recently I downloaded the trace file and tries to substract some useful information for my study - from 00 to 56, hundreds of GB after depressing;
I want to transform the raw event information into the "job - task" format, somthing in .csv file like each line contains a job, its arrival time, average memory request of the tasks, each task's duration of this job, etc.
After such convertion, it's surprising that most of the jobs contains less than 3 tasks, mush smaller than I expected. Actually, the avearge task number of each raw file ranges from 3 to 7, and only maybe 5% of jobs contains more than 100 tasks.
My question is, is this the correct workload distribution in the Google Trace 2019? or did I made some mistakes in the code (or read the guide incorrectly) so that it derives the incorrect task number?
Reply all
Reply to author
Forward
0 new messages