Export the symbolic state

Dave

unread,

Aug 22, 2024, 1:45:11 PM8/22/24

to S2E Developer Forum

Hi,

For a project I am working on, I am trying to export some form of a symbolic state for a program during execution.

I was wondering whether it is possible to, at a certain point during program execution, acquire the locations (memory addresses) in guest memory that contain some symbolic value. Looking at the S2E code, the addresses translate from the guest to the host one-way, but I am curious whether I can also get the guest addresses back in some way.

In addition to that, I tried to acquire a concrete memory dump from QEMU when running in S2E mode (to combine with the knowledge of symbolic memory from the above), but this causes QEMU to freeze. I have not been able to determine why.

Do you have any suggestions on where I should look to implement this or address these things, and if necessary what would be a good starting point for me to implement this myself?

Thanks for your time and kind regards,

Dave

Vitaly Chipounov

unread,

Aug 23, 2024, 10:08:13 AM8/23/24

to s2e...@googlegroups.com

Hi,

On Thu, Aug 22, 2024 at 7:45 PM Dave <davequa...@gmail.com> wrote:

For a project I am working on, I am trying to export some form of a symbolic state for a program during execution.

I was wondering whether it is possible to, at a certain point during program execution, acquire the locations (memory addresses) in guest memory that contain some symbolic value. Looking at the S2E code, the addresses translate from the guest to the host one-way, but I am curious whether I can also get the guest addresses back in some way.

In general, guest virtual and guest physical memory mapping is not one to one since processes can share pages. You could write a plugin that enumerates all virtual pages in the address space of a process and then check if state->mem()->read() returns a symbolic expression for a given byte.

In addition to that, I tried to acquire a concrete memory dump from QEMU when running in S2E mode (to combine with the knowledge of symbolic memory from the above), but this causes QEMU to freeze. I have not been able to determine why.

How did you try to get the dump? If you use Windows guests, there is WindowsCrashDumpGenerator to create dumps that can be opened in WinDbg.

Vitaly

Dave

unread,

Aug 23, 2024, 2:12:25 PM8/23/24

to S2E Developer Forum

Hi,

Thank you for the quick response!

You could write a plugin that enumerates all virtual pages in the address space of a process and then check if state->mem()->read() returns a symbolic expression for a given byte.

I had looked at how I could walk through guest memory earlier, but had not been able to determine how to access the virtual pages in the address space (of the guest, or the program under analysis); at that time I just looked at the objects in the KLEE address space. Where do you suggest I look to access these pages for a certain (selected) process from a plugin?

How did you try to get the dump? If you use Windows guests, there is WindowsCrashDumpGenerator to create dumps that can be opened in WinDbg.

Apologies, I should have been more specific. I am running this with a Linux guest image. Since I wanted to dump the guest's memory at a specific point during execution, I attempted to send a dump-guest-memory command from the S2E plugin. This caused a freeze. I also dropped to a shell during S2E's bootstrap to try this manually, but this causes QEMU to exit immediately. I'm afraid I currently do not have any more information about this, but I will try this again to see if I can find anything else.

I was also wondering about the propagation of symbolic values, specifically when using a symbolic value as an index for an array access. Namely, I want to follow symbolic data from a syscall through the kernel up to some potentially vulnerable code, where I want to see e.g., whether an array access could go out of bounds. Is it the case that when an array index is symbolic, the result value of this access will be concretized? In a test program, I tried something as follows:

unsigned char access[256]; // assume initialized

unsigned char access_2[256]; // assume initialized

unsigned char idx = 0xaa; // assume symbolic

unsigned char result_1 = [idx];

unsigned char result_2 = [result_1];

In my test program, I found that it seems like result_1 is concretized, and as a result it is not possible for me to track a symbolic value as a (partial) source for result_2. I played around with the settings, such as disabling forking on symbolic pointers, but this did not change anything. Also, if this *is* possible, is it also when the data/types used are much larger (e.g., span memory pages, are of type uint64, etc.), and when traveling very far into kernel code? I assume the usual limitations of taint propagation apply, but I am curious about this specifically.

Kind regards,

Dave

Vitaly Chipounov

unread,

Aug 26, 2024, 12:09:57 PM8/26/24

to s2e...@googlegroups.com

Hi,

On Fri, Aug 23, 2024 at 8:12 PM Dave <davequa...@gmail.com> wrote:

You could write a plugin that enumerates all virtual pages in the address space of a process and then check if state->mem()->read() returns a symbolic expression for a given byte.

I had looked at how I could walk through guest memory earlier, but had not been able to determine how to access the virtual pages in the address space (of the guest, or the program under analysis); at that time I just looked at the objects in the KLEE address space. Where do you suggest I look to access these pages for a certain (selected) process from a plugin?

That's a pretty common thing that most plugins do. Look for state->mem()->read in the code base.

How did you try to get the dump? If you use Windows guests, there is WindowsCrashDumpGenerator to create dumps that can be opened in WinDbg.

Apologies, I should have been more specific. I am running this with a Linux guest image. Since I wanted to dump the guest's memory at a specific point during execution, I attempted to send a dump-guest-memory command from the S2E plugin. This caused a freeze. I also dropped to a shell during S2E's bootstrap to try this manually, but this causes QEMU to exit immediately. I'm afraid I currently do not have any more information about this, but I will try this again to see if I can find anything else.

That command probably accesses memory directly. In S2E, the guest RAM is unmapped and QEMU must perform all accesses through a wrapper.

I was also wondering about the propagation of symbolic values, specifically when using a symbolic value as an index for an array access. Namely, I want to follow symbolic data from a syscall through the kernel up to some potentially vulnerable code, where I want to see e.g., whether an array access could go out of bounds. Is it the case that when an array index is symbolic, the result value of this access will be concretized? In a test program, I tried something as follows:

In order to avoid forking, the memory access must be aligned and constrained to a 256 byte range. Otherwise, there will be many additional forks to account for various unaligned accesses that span 256 byte intervals. Check these two test cases to understand how it works:

https://github.com/S2E/s2e/blob/master/testsuite/basic9-symaddr-unaligned/main.c

https://github.com/S2E/s2e/blob/master/testsuite/basic9-symaddr/main.c

Unfortunately, it is not possible to eliminate forking completely. You could increase 256 bytes to 4K but that would be hard on the constraint solver. Going above 4K is not possible because the TLB is concrete (i.e., only the 12 lower address bits may be symbolic).

Vitaly

Dave

unread,

Aug 27, 2024, 6:38:57 AM8/27/24

to S2E Developer Forum

Hi,

That's a pretty common thing that most plugins do. Look for state->mem()->read in the code base.

In the meantime I'd found what I was looking for in some of the plugins, thanks!

That command probably accesses memory directly. In S2E, the guest RAM is unmapped and QEMU must perform all accesses through a wrapper.

Makes sense, and good to know.

In order to avoid forking, the memory access must be aligned and constrained to a 256 byte range. Otherwise, there will be many additional forks to account for various unaligned accesses that span 256 byte intervals. Check these two test cases to understand how it works:

https://github.com/S2E/s2e/blob/master/testsuite/basic9-symaddr-unaligned/main.c
https://github.com/S2E/s2e/blob/master/testsuite/basic9-symaddr/main.c

Unfortunately, it is not possible to eliminate forking completely. You could increase 256 bytes to 4K but that would be hard on the constraint solver. Going above 4K is not possible because the TLB is concrete (i.e., only the 12 lower address bits may be symbolic).

Okay, thank you for your explanation. The test cases clear that up as well.

For my situation I will then try to follow the concrete path I want to the desired code with symbolic inputs, while forking on symbolic pointers and pruning the paths that are not interesting. I assume that would come closest to following a concrete path and gathering expressions to solve without forking on symbolic pointers (or forking at all).

One final question I had was about using a different kernel than the S2E 6.8.2 Linux kernel. I want to put a kernel I have into an already built image. I looked at the s2e-linux-kernel README to put a kernel in a pre-built image. It is a bit unclear to me how I should match up the timestamps and create the ready snapshot when doing this manually.

Thanks again for your replies to my questions, they have been very helpful!

Kind regards,

Dave

Vitaly Chipounov

unread,

Aug 27, 2024, 10:19:25 AM8/27/24

to s2e...@googlegroups.com

Hi,

Regarding adding a new kernel, it's best to look at the recent commit history in s2e-linux-kernel and guest-images repo.

I don't support creating snapshots manually, there are just too many steps and many opportunities for hard to debug errors.

The steps look like this:
- Import a new kernel into s2e-linux-kernel

- Apply S2E patches on it

- Setup a kernel config file (with the S2E option enabled)

- Update the guest-images repo to use the new kernel

- Build the image.

An interesting project would be to figure out how to insert the S2E instrumentation without modifying the kernel, e.g., with systemtap or ebpf.

Vitaly

--
You received this message because you are subscribed to the Google Groups "S2E Developer Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s2e-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/s2e-dev/91acebe7-e9f6-47db-81b0-414d71ff1770n%40googlegroups.com.

Message has been deleted

Dave

unread,

Aug 27, 2024, 11:27:30 AM8/27/24

to S2E Developer Forum

Hi,

Regarding adding a new kernel, it's best to look at the recent commit history in s2e-linux-kernel and guest-images repo.
I don't support creating snapshots manually, there are just too many steps and many opportunities for hard to debug errors.
The steps look like this:
- Import a new kernel into s2e-linux-kernel
- Apply S2E patches on it
- Setup a kernel config file (with the S2E option enabled)
- Update the guest-images repo to use the new kernel
- Build the image.

The reason I wanted to use a pre-built image was because I am unable to build a new image correctly. I have followed these steps above earlier, after I had already tried to build normally.

On one machine I tried it on, I got an error during the building of the S2E 6.8.2 kernel: "error: corrupted size vs. prev_size".

On another, the build succeeds up to this point:

[Thu Aug 22 14:11:50 UTC 2024] [ubuntu-22.04-x86_64] Extracting guestfs...
fusermount -u /home/s2e/images/ubuntu-22.04-x86_64/imagefs
fusermount: entry for /home/s2e/images/ubuntu-22.04-x86_64/imagefs not found in /etc/mtab
make: [/home/s2e/source/guest-images/Makefile.linux:255: /home/s2e/images/ubuntu-22.04-x86_64/guestfs] Error 1 (ignored)

The build then freezes here:

[Thu Aug 22 14:12:38 UTC 2024] [ubuntu-22.04-x86_64] Guestfs extraction completed.
cp "/home/s2e/images/.tmp-output/linux-6.8.2-x86_64/"*/vmlinux "/home/s2e/images/ubuntu-22.04-x86_64/guestfs/vmlinux"

Kind regards,

Dave

Vitaly Chipounov

unread,

Aug 27, 2024, 11:35:53 AM8/27/24

to s2e...@googlegroups.com

Hi,

The fusermount ignored error is ok. It tries to unmount stale folders first before proceeding.

Regarding "error: corrupted size vs. prev_size", can you please add more context? What is crashing exactly?

Doing s2e image_build ... 2>&1 | tee log.txt would help.

Vitaly

--

You received this message because you are subscribed to the Google Groups "S2E Developer Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s2e-dev+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/s2e-dev/8f903b3d-5317-404a-bf89-8496af740d7bn%40googlegroups.com.

Reply all

Reply to author

Forward