Trace scheduler reports "Failed to open file..."

64 views
Skip to first unread message

York Ma

unread,
May 20, 2025, 10:21:34 AMMay 20
to DynamoRIO Users
I have a trace converter that converts google-trace v2 into my own trace format based on the trace scheduler.
The converter has been worked well for several trace collections, such as delta, tahoe, and whiskey.
However, in some collections such as tango, the trace scheduler reports error message such as "Failed to open /path/to/external-traces-v2/tango/2523714004182984823.865263.memtrace.zip" and stopped (error_code=2).
I am sure the underlying file is good (by uncompress it) and re-download the file also does not help.

Is the trace file in some format that the trace scheduler does not support?
The same error also occurs on collection charlie, bravo.a, etc.
Any suggestion?

Enrico Deiana

unread,
May 20, 2025, 1:41:23 PMMay 20
to DynamoRIO Users
Hi!

I couldn't reproduce your error.
After downloading tango/trace/2523714004182984823.865263.memtrace.zip (<--- the problematic file you mentioned) and running the schedule_stats tool on it (which uses the scheduler) using `drrun -t drmemtrace -simulator_type schedule_stats -indir /tmp/tango/trace -core_sharded -cores 4`, the tool completes successfully.
If you put one of the problematic files in its own directory and run schedule_stats on it as shown above, does it work on your end?
Also, what version of DynamoRIO are you using? Is it the current master or 11.3?

It looks like a file corruption issue.
We do plan to add md5 checksum.

- Enrico

York Ma

unread,
May 23, 2025, 6:14:28 AMMay 23
to Enrico Deiana, DynamoRIO Users
Hi,

I found something interesting. The fail only occurs when feeding a large set of traces as the input. The scheduler works well if you feed only the underlying trace but feeding the whole tango set hits the bug. You may re-produce the fail by scheduler_launcher with the command:

    /DR_build_path/clients/bin64/scheduler_launcher /path/to/external-traces-v2/tango


--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dynamorio-users/259ac420-fa80-4dc1-a77d-af942c91b2e4n%40googlegroups.com.

York Ma

unread,
May 23, 2025, 6:18:15 AMMay 23
to Enrico Deiana, DynamoRIO Users
Sorry, the command to re-produce the error should be

   /DR_build_path/clients/bin64/scheduler_launcher --trace_dir /path/to/external-traces-v2/tango

And its under the commit: 33d4eca7f0125033c6c4ac58b41944ab35507307

Enrico Deiana

unread,
Jun 2, 2025, 12:05:16 PMJun 2
to DynamoRIO Users
Apologies for the delay.
I downloaded the whole tango trace in my local machine and ran your command, but it succeeded.
I have not seen your error.
Here's my output: https://gist.github.com/edeiana/aed35d217556582a9125608d384b2056 (I truncated the repeated parts, it's quite long otherwise).

Can you give us more details on what machine you are running scheduler_launcher?
x86/aarch64? How many CPUs? How much memory? OS? etc.

- Enrico

York Ma

unread,
Jun 3, 2025, 1:40:28 AMJun 3
to Enrico Deiana, DynamoRIO Users
Here's the information about my environment
- CPU: x86 (Intel Xeon Platinum 8460Y+)
- #CPUs: 80
- OS: redhat enterprise linux (4.18.0-513.24.1.el8_9.x86_64)
MemTotal:       1584279228 kB
-  SwapTotal:      134143996 kB
-  Percpu:           302080 kB

York Ma

unread,
Jun 4, 2025, 3:42:32 AMJun 4
to Enrico Deiana, DynamoRIO Users
Hi,

The problem is solved.
The cause is that the amount of opened files in a process exceeds the limit set by the OS.
Raising the limit using setrlimit system call solved this problem.
Thank you for all your help.

Derek Bruening

unread,
Jun 4, 2025, 4:36:00 PMJun 4
to York Ma, Enrico Deiana, DynamoRIO Users

Enrico Deiana

unread,
Jun 5, 2025, 6:19:26 PMJun 5
to DynamoRIO Users
Thank you for letting us know what the problem was!
We updated the doc on Google Workload Traces letting other users know that this could be an issue depending on their Linux distribution and the "nofile" default threshold set.

- Enrico

Reply all
Reply to author
Forward
0 new messages