Sharing client data across Forks

25 views
Skip to first unread message

Mohammad Ewais

unread,
Oct 4, 2025, 12:00:08 AM (13 days ago) Oct 4
to DynamoRIO Users
Hi,

I have used DynamoRIO to create a computer architecture simulator. Been using it for a year or so with great results.

Generally speaking, I create the simulation infrastructure (models for cores, caches, etc) within the client, then on every thread creation (using drmgr_register_thread_init_event) I assign every thread to a core, and instrument its BBs and memory accesses on that core, etc.

Typically, I have been dealing with multi-threaded workloads/benchmarks. And things were running flawlessly. I recently encountered a multi-process benchmark, so I had to extend the simulator to deal with forks. I added a dr_register_fork_init_event call and passed to it the same function I pass to drmgr_register_thread_init_event.

The problem: I think DR is replicating/copying all my client data (including the simulation infrastructure) for the forked processes. For example, I use a simple core counter to track how many cores I have used, I print it (and its address) during the thread/fork init callbacks, and this is what I get:
Core Index was 0 0x7fa1e3804a58 # First process
Core Index is now 1 0x7fa1e3804a58
Core Index was 1 0x7fa1e3804a58 # Forked Process 1
Core Index is now 2 0x7fa1e3804a58
Core Index was 1 0x7fa1e3804a58 # Forked Process 2
Core Index is now 2 0x7fa1e3804a58
Core Index was 1 0x7fa1e3804a58 # Forked Process 3
Core Index is now 2 0x7fa1e3804a58


Is there a way to keep my DR client from "forking itself" along with the target process? And keep my data shared across them? I basically want to treat forks the same way as threads.

Mohammad Ewais

unread,
Oct 4, 2025, 12:03:47 AM (13 days ago) Oct 4
to DynamoRIO Users
Forgot to mention, I am using DR8 (a bit old I know, but development took too long to finish. Upgrading was an extra risk of things breaking). I also run with the options `-thread_private` and `-disable_traces`. Though I doubt the options have anything to do with it.

Abhinav Sharma

unread,
Oct 6, 2025, 11:24:27 AM (10 days ago) Oct 6
to DynamoRIO Users
Hi,

>  I also run with the options `-thread_private` and `-disable_traces`. Though I doubt the options have anything to do with it.

-thread_private affects code cache operation within a process, not across processes. -disable_traces is orthogonal.

> Is there a way to keep my DR client from "forking itself" along with the target process?

Note that DR logic is generally not run in separate threads, but as part of the app threads themselves.

> And keep my data shared across them? I basically want to treat forks the same way as threads.

There's no support for sharing client state across multiple processes running under DR. I guess some sort of inter-process communication could be set up by the client itself to communicate between the instances across different processes, but I'm not sure if it's feasible as there are various unknowns (e.g., efficiency issues, and possible transparency violations from the IPC), and it may not be the simplest solution.

From your description, I gather that your simulation infrastructure operates "online", that is, while the target program is executing. Would it be any better if you were to use the offline drmemtrace traces that are written to disk and used for analysis/simulation later in a separate run? Our drmemtrace scheduler has some preliminary support for combining multiple workload traces.

Abhinav

Mohammad Ewais

unread,
Oct 6, 2025, 2:39:51 PM (10 days ago) Oct 6
to DynamoRIO Users
Hey Abhinav,

Thanks a lot for your reply. Unfortunately, the online functionality is needed. My simulator DOES use shared memory for some things, I guess my only option now is to create the infrastructure within that shared memory so it can be used by everything.

Thanks a lot.

Derek Bruening

unread,
Oct 7, 2025, 11:30:29 AM (9 days ago) Oct 7
to Mohammad Ewais, DynamoRIO Users
Right, you would need to set up what you want to do in the fork event.  Typically a client would want process-private state and would use the fork event to reset most of its state for the new process.  For process-shared: as noted, you would have to set up IPC yourself and come up with a synchronization protocol if something is shared-writable.  There are existing cases of process-shared clients.  The simplest is what drmemtrace's online tracing does, where each application process sends data over pipes to a single separate analyzer process that is shared among all app processes.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dynamorio-users/01b61645-4987-4651-9b43-dd120fecec94n%40googlegroups.com.

Mohammad Ewais

unread,
Oct 8, 2025, 11:32:19 AM (8 days ago) Oct 8
to DynamoRIO Users
I have a follow up question then.

I would generally prefer shared memory over IPC since I would want to access the simulation infrastructure on every instruction and memory access. Shared memory avoids any added latency. Plus, I already had it set up for other reasons, all I need is to create the simulation infrastructure in shared memory instead.

Here's the challenging bit. My simulation infrastructure has a lot of classes, with inheritance and virtual function. That means they use virtual tables and type_info data structures that most probably reside in the .rodata section of the client module (Please correct me if I am wrong). Even if I allocate my objects using shared memory, these vtables and type_infos cannot be moved and will still reside outside the shared memory. I suppose this is not an issue when the target app forks, since my client will also fork with it, keeping the same .rodata section at the same address. The objects will then remain at the shared memory accessible by the two processes/clients, and each client will have a separate copy (at the same address) of the vtables and type_infos. Things will still work. 
The problem is if/when the forked application decides to execve. I don't know what happens with DR at this case? Will the client's module stay at the same address? Or does execve force its reloading, potentially at a different address?

Thanks.

Derek Bruening

unread,
Oct 8, 2025, 12:37:56 PM (8 days ago) Oct 8
to Mohammad Ewais, DynamoRIO Users
On Wed, Oct 8, 2025 at 11:32 AM Mohammad Ewais <mohammad...@gmail.com> wrote:
I have a follow up question then.

I would generally prefer shared memory over IPC since I would want to access the simulation infrastructure on every instruction and memory access. Shared memory avoids any added latency. Plus, I already had it set up for other reasons, all I need is to create the simulation infrastructure in shared memory instead.

Here's the challenging bit. My simulation infrastructure has a lot of classes, with inheritance and virtual function. That means they use virtual tables and type_info data structures that most probably reside in the .rodata section of the client module (Please correct me if I am wrong). Even if I allocate my objects using shared memory, these vtables and type_infos cannot be moved and will still reside outside the shared memory. I suppose this is not an issue when the target app forks, since my client will also fork with it, keeping the same .rodata section at the same address. The objects will then remain at the shared memory accessible by the two processes/clients, and each client will have a separate copy (at the same address) of the vtables and type_infos. Things will still work. 
The problem is if/when the forked application decides to execve. I don't know what happens with DR at this case? Will the client's module stay at the same address? Or does execve force its reloading, potentially at a different address?

Read-only parts of your library will not be copied: they will share the same physical backing.  The virtual address could change but I'm not sure why that would matter?
Writable data you would have to only use for process-private data.  Anything needed globally by this multi-process simulator would have to be in the shared memory.
The execve will reset and load from scratch.  You would have to use DR options or some other mechanism to get the new instance of your client to find the shared memory.
 

Mohammad Ewais

unread,
Oct 8, 2025, 1:15:19 PM (8 days ago) Oct 8
to DynamoRIO Users
I would of course love to put the vtables and type_info metadata in shared memory as well, save myself all this hassle, but it's not possible to control their location AFAIK. They are loaded into .rodata as the client is loaded. So, the second-best thing is to keep .rodata at the same virtual address for all instances of the client, otherwise, the objects created in the shared memory will have a _vptr that points to a non-existent vtable/type_info and will definitely cause issues. This is based on my understanding of this:  Memory Layout of C++ Object in Different Scenarios | Vishal Chovatiya which I hope is correct.

You said the virtual address could change, did you mean that on forks? I thought fork will replicate the entire virtual memory layout for the process as is. Which would mean my shared memory stuff will be fine/safe on forks. If this is correct, then forks should be safe and I should only worry about execves because they will definitely reload the client (and consequently its .rodata and the included vtables and type_infos) at another virtual address. Maybe a workaround would be to allocate some memory at the same virtual address post execve but before any modules are loaded, and copy the rodata in it, or something along these lines.

Mohammad Ewais

unread,
Oct 9, 2025, 2:13:15 PM (7 days ago) Oct 9
to DynamoRIO Users
Actually, trying to modify my code to move everything to shared memory is quite tedious AND error prone. There are global variables, STLs with custom memory allocators, class members that also need handling, etc. The problem is if any of these is allocated wrongly or I forget to move them to shared memory, it will be near impossible to debug my way out of it.

So, I am considering a different approach. In my client, I also track module loads and their sections. I think it might be possible to change the write sections of the client module to have MAP_SHARED, which would cause them to be shared with any child process after fork. That should take care of any global variables in my client. Something similar for memory allocated by the client would also make sense. At least that's the basic idea.

Two questions:
1. Where does DR allocate memory when the client calls new, ram_mem_alloc, global_alloc, etc? I know it keeps it separate from the application, but no other info given. I can also see that my client module and the `libdynamorio.so` modules have huge footers after their last sections, at least compared to all other modules. Are these regions used for the allocation?
2. Since there is no way to directly change a memory to have MAP_SHARED, I will have to resort to something like the following, would this work as a sequence?
// After simulator is already allocated but BEFORE fork
void* old_sim_base = existing_simulator_address;
size_t sim_size = simulator_size;

// Create shared backing
int memfd = memfd_create("sim", 0);
ftruncate(memfd, sim_size);

// Map shared region at temporary location
void* temp = mmap(NULL, sim_size, PROT_READ|PROT_WRITE,
                  MAP_SHARED, memfd, 0);

// Copy existing data
memcpy(temp, old_sim_base, sim_size);

// Now the trick - atomically replace old with new
void* new_base = mremap(temp, sim_size, sim_size,
                        MREMAP_MAYMOVE | MREMAP_FIXED, old_sim_base);

// Now old_sim_base points to shared memory with same content


Thanks a lot for your help.
Reply all
Reply to author
Forward
0 new messages