Instance_events table issue

45 views

Skip to first unread message

Shining Flag

unread,

Aug 4, 2025, 3:51:55 AM8/4/25

to Google cluster data - discussions

Hello sir, I'm currently a PhD student from Singapore.

Recently I downloaded the trace file and tries to substract some useful information for my study - from 00 to 56, hundreds of GB after depressing;

I want to transform the raw event information into the "job - task" format, somthing in .csv file like each line contains a job, its arrival time, average memory request of the tasks, each task's duration of this job, etc.

After such convertion, it's surprising that most of the jobs contains less than 3 tasks, mush smaller than I expected. Actually, the avearge task number of each raw file ranges from 3 to 7, and only maybe 5% of jobs contains more than 100 tasks.

My question is, is this the correct workload distribution in the Google Trace 2019? or did I made some mistakes in the code (or read the guide incorrectly) so that it derives the incorrect task number?

Reply all

Reply to author

Forward

0 new messages