Questions about constraints generated when symbolic data is used as array index

49 views
Skip to first unread message

Weiqi Wang

unread,
Jul 28, 2021, 10:35:19 AMJul 28
to S2E Developer Forum
Hi,

I noticed that s2e generates constraint that sets field to completely concrete when symbolic data is used as array index.

For example, the following code checks whether all given characters are in the specification set.

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <s2e/s2e.h>

int main(void) {
    // suppose the characters to be checked are "c5"
    char input[32] = { 'c', '5' };

    s2e_make_symbolic(input, 32, "input");
    
    // init the spec array
    int spec[256];
    // 1 means the char is allowed (i.e. in spec)
    spec['c'] = 1;
    // 0 means the char is not allowed (i.e. not in spec)
    spec['5'] = 0;

    if(spec[input[0]]){
        printf("input[0]: 'c' is in spec\n");
    }else{
        printf("input[0]: 'c' NOT in spec\n");
    }

    if(spec[input[1]]){
        printf("input[1]: '5' is in spec\n");
    }else{
        printf("input[1]: '5' NOT in spec\n");
    }

    return 0;
}

I run the program concretely (i.e. fork disabled) and collect constraint during exeuction. The dumped query is:

array v0_input_0[32] : w32 -> w8 = symbolic
(query [(Eq 0xd4
             (Shl w64 (SExt w64 (SExt w32 (Read w8 0x1 v0_input_0)))
                      0x2))
         (Eq 0x18c
             (Shl w64 (SExt w64 (SExt w32 (Read w8 0x0 v0_input_0)))
                      0x2))]
        false []
        [v0_input_0])

Solving the constraints gives solution "c5". So using symbolic data as array index results in constraints that set the symbolic data to its current concrete value. Besides, whether the char is in spec or not doesn't affect the constraints (e.g. "5" is allowed as the second char)

I want to achieve something similar to DART (run concretely and collect constraint, then negates some constraint). These constraint are too strict. The code is just checking whether a character is alllowed while the constraints set the field value to completely concrete.

Is there a way to not generate constraint of this type? (perhaps skip generating constraint when accessing symbolic address?) Can you point me to some related files I can modify?

Best,
Weiqi

Vitaly Chipounov

unread,
Jul 28, 2021, 10:45:17 AMJul 28
to s2e...@googlegroups.com, Weiqi Wang

Hi,

This looks like the expected behavior. S2E uses concolic execution [1], which means that when the path terminates, there is no need to call the solver, the initial values (c, 5) will be used. S2E calls the solver on forks, which will create a new set of concrete inputs to be used on the forked path. To get alternative values, you need to enable forking. You can also save the constraints, negate them offline, then use the new concrete inputs in another run. S2E does not have support for this yet.

Vitaly

[1] http://s2e.systems/docs/Howtos/Concolic.html

--
--
You received this message because you are a member of the S2E Developer Forum.
To post to this group, send email to s2e...@googlegroups.com
To unsubscribe from this group, send email to s2e-dev+u...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/s2e-dev

---
You received this message because you are subscribed to the Google Groups "S2E Developer Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s2e-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/s2e-dev/b5b9163c-5f96-4f7c-91b3-487ccac7caf2n%40googlegroups.com.

Weiqi Wang

unread,
Jul 28, 2021, 11:10:56 AMJul 28
to S2E Developer Forum
Hi Vitaly,

Thanks for your reply. The way I get "c5" as the solution is I collect constraints during execution and then solve offline (as opposed to use s2e_get_example which would output concret value during concolic execution). Thus they have no knowledge of the initial value "c5".

The difference between s2e_get_example and collect constraints then solve is shown below:

int main(void) {
    char input[32] = { 'c', '5' };

    s2e_make_symbolic(input, 32, "input");

    if (input[1] >= '0' && input[1] <= '9') {
        printf("First char is a digit\n");
    } else {
        printf("First char is not a digit\n");
    }

    s2e_get_example(input, 2);

    return 0;
}

Constraints:
array v0_input_0[32] : w32 -> w8 = symbolic
(query [(Eq false
             (Slt 0x39
                  N0:(SExt w64 (Read w8 0x1 v0_input_0))))
         (Eq false (Sle N0 0x2f))]
        false []
        [v0_input_0])

The s2e_get_example gives result "c5", while solving collected constraints gives result "0xFF 0x30" (string is "ÿ0").

My point is, when symbolic data is used as array index, the constraints generated are too strict (see my message above, they directly set the input to "c5" even without the knowledge of initial input). I'm wondering where I can modify to avoid generating such constraints.

Vitaly Chipounov

unread,
Jul 28, 2021, 12:12:51 PMJul 28
to s2e...@googlegroups.com, Weiqi Wang

Hi,

Right, I forgot that by default, S2E forks and concretizes symbolic pointers. You need to disable that [1].

Vitaly

[1] https://github.com/S2E/s2e/blob/master/libs2ecore/src/S2EExecutor.cpp#L111

Weiqi Wang

unread,
Jul 28, 2021, 9:00:14 PMJul 28
to Vitaly Chipounov, s2e...@googlegroups.com
Hi Vitaly,

Thanks. I disabled ForkOnSymbolicAddress by setting cl::init(false) but it doesn't work. I observed more constraints that related to addresses. However, the constraints that sets the field to concrete are still there (the "Eq 0xd4" and "Eq 0x18c" ones)

array v0_input_0[32] : w32 -> w8 = symbolic                                                                                              
(query [(Eq 0xd4
             N0:(Shl w64 (SExt w64 (SExt w32 (Read w8 0x1 v0_input_0)))
                         0x2))
         (Eq 0x0
             (And w64 N1:(Add w64 0x7fffaf7f5b90 N0) 0x3))
         (Eq 0x7fffaf7f5000 (And w64 N1 0xfffffffffffff000))
         (Eq 0x7fffaf7f5 (LShr w64 N1 0xc))
         (Eq 0x18c
             N2:(Shl w64 (SExt w64 (SExt w32 (Read w8 0x0 v0_input_0)))
                         0x2))
         (Eq 0x0
             (And w64 N3:(Add w64 0x7fffaf7f5b90 N2) 0x3))
         (Eq 0x7fffaf7f5000 (And w64 N3 0xfffffffffffff000))
         (Eq 0x7fffaf7f5 (LShr w64 N3 0xc))]
        false []
        [v0_input_0])

Vitaly Chipounov

unread,
Jul 29, 2021, 5:34:35 AMJul 29
to Weiqi Wang, s2e...@googlegroups.com

Hi,

Here is a slightly tweaked program that should behave properly (4 states).

- Made the array aligned
- Made it 256-byte long
- Checked that -Wall doesn't return any warnings. Your original code had symbolic signed indexes, which could go negative during symbolic execution.

The reason for this is because S2E's symbolic pointer handling has various limitations:
- It cannot handle symbolically pointers that span multiple pages.
- It restricts symbolic pointers to chunks of 256 bytes. Anything larger than that will cause many forks to handle all possible cases of unaligned accesses.


#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <s2e/s2e.h>

__attribute__ ((aligned (0x1000))) unsigned char spec[256] = {0};



int main(void) {
    // suppose the characters to be checked are "c5"

    unsigned char input[32] = { 0 };
    input[0] = 'c';
    input[1] = '5';



    s2e_make_symbolic(input, 32, "input");
   
    // init the spec array

    // 1 means the char is allowed (i.e. in spec)
    spec['c'] = 1;
    // 0 means the char is not allowed (i.e. not in spec)
    spec['5'] = 0;

    if(spec[input[0]]){
        printf("input[0]: 'c' is in spec\n");
    }else{
        printf("input[0]: 'c' NOT in spec\n");
    }

    if(spec[input[1]]){
        printf("input[1]: '5' is in spec\n");
    }else{
        printf("input[1]: '5' NOT in spec\n");
    }

    return 0;
}


Vitaly

Weiqi Wang

unread,
Jul 29, 2021, 9:35:04 AMJul 29
to S2E Developer Forum
Hi Vitaly,

Thank you for your reply. I tried the tweaked program but it still generates very strict constraints when accessing symbolic addresses. What I'm looking for is a way to NOT generate constraints when accessing symbolic addresses.

For example, in real world network protocol implementation where I set the recv_buf to be symbolic, the program uses the code similar to the one I showed above to check whether the bytes in a field are allowed in specification.

Suppose the allowed characters for a field_x are "a-z" (all lowercase letters). The program would set lowercase letters to be allowed in spec
unsigned char spec[256] = {0}
spec['a'] = 1
spec['b'] = 1
...
spec['z'] = 1

When field_x in recv_buf contains "c9", the program would loop through each byte and check the specification:
while (spec[i] != '\n'){
    if(spec[i]){
        // character in spec
    }else{
        //character not in spec. the field contains invalid characters. abort
    }
}

The problem is, the generated constraints depends on the initial value in field_x. Specifically, if field_x has initial value "c9", the solution of collected constraint would be "c9". If field_x has initial value "d0", the solution of collected constraint would be "d0". This is too strict because actually field_x is allowed to contain any lowercase characters.

I guess capturing the fact that field_x is allowed to contain any lowercase letters and generate corresponding constraints would be hard(Please correct me if I am wrong). That's why I'm seeking a way to avoid generating constraints for access to symbolic addresses.

Here are the links if you want to have a look at the code
[1] is the example I give above
[2] is an example from libc. When calling tolower, any letters are accepted. But the generated constraints would set the input to whatever the value it is at the time of calling


Here are the generated constraints for the tweaked program. You can see it requires the byte to be exactly 0x35 and 0x63.
array v0_input_0[32] : w32 -> w8 = symbolic                                                                                               
(query [(Eq 0x35 (Read w8 0x1 v0_input_0))
         (Eq 0x63 (Read w8 0x0 v0_input_0))]
        false []
        [v0_input_0])

Vitaly Chipounov

unread,
Jul 29, 2021, 10:18:36 AMJul 29
to s2e...@googlegroups.com, Weiqi Wang

Hi,

It's not clear to me why you would get the constraints you mention. Please find attached the query log and the project for the program I gave you. You will see queries of this form:

# Query 10 -- Type: InitialValues, Instructions: 28685
array const_arr1[4096] : w32 -> w8 = [0x0 ... 0x0]


array v0_input_0[32] : w32 -> w8 = symbolic
(query [(Eq false

             (Eq 0x0
                 (Read w8 (ZExt w32 (Read w8 0x1 v0_input_0))
                          U0:[0xfff=0x0, 0xffe=0x0, 0xffd=0x0, 0xffc=0x0, 0xffb=0x0, 0xffa=0x0, ... 0x3=0x0, 0x2=0x0, 0x1=0x0, 0x0=0x0] @ const_arr1)))
         (Eq 0x0
             (Read w8 (ZExt w32 (Read w8 0x0 v0_input_0))
                      U0))]
        false []
        [v0_input_0])
#   OK -- Elapsed: 5.237028e-01
#   Solvable: true
#     v0_input_0 = [100,99,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100]

You get an array of constants that contains the fixed spec array. The symbolic expressions reads from that array.

Use the following settings in s2e-config.lua, don't modify the settings in the source.

kleeArgs = {
      "--fork-on-symbolic-address=false",
      "--use-query-log=all:kquery",
}

Vitaly

all-queries.kquery
test123.tar.xz.zip

Weiqi Wang

unread,
Jul 29, 2021, 8:34:59 PMJul 29
to S2E Developer Forum
Hi Vitaly, 

Thank you so much for your reply and the attachment. I figured it out and am able to get the same result as yours. I'm trying to apply the configuration to real world program project. With fork-on-symbolic-address disabled, I observed that: for pjsip s2e often gets stuck for a long time and finally qemu segfaults and for baresip it gets totally stuck at the beginning. s2e was working well when I ran it with fork-on-symbolic-address enabled. I ran s2e with fork disabled in both cases so state explosion is not the cause, I guess.

Questions:
Why does disabling fork-on-symbolic-address causes longer runtime and even segfaults? Can you give me some hints on how to debug this?
Is there a way to let s2e run with fork-on-symbolic-address enabled while emitting the constraints with fork-on-symbolic-address disabled? I only need to collect the loose constraints after all.

Best,
Weiqi


Weiqi Wang

unread,
Jul 29, 2021, 11:22:45 PMJul 29
to S2E Developer Forum
BTW. If s2e doesn't fork on symbolic address, does it maintain the full symbolic state of the pointer? Is it more likely to run out of memory in this case?

Vitaly Chipounov

unread,
Jul 30, 2021, 12:18:52 PMJul 30
to s2e...@googlegroups.com, Weiqi Wang

Hi,

Symbolic pointers in S2E are not ready for real-world programs yet, that's why they are disabled by default. Symbolic pointers make it much harder to solve constraints. Symbolic execution engines that have access to the source code, e.g., KLEE, can reason more precisely about them, by sending to the solver the minimum required (e.g., your spec array). S2E however, does not know anything about this, so it has to send entire memory pages instead. This unfortunately breaks the solver. Perhaps it's possible to rewrite the queries to make it easier on the solver, but I am no expert on this.

Something else you could do is rewrite your symbolic array accesses in terms of it-then-else. You could have a function spec that looks like this:
uint8 spec(uint8 index) {
  if (index == 0) return 'a';
  if (index == 1) return 'b';
  ...
  if (index == 255) return ...;
}

You will end up with simpler constraints and DART-style concolic execution should have no problems extracting them and negating the desired ones to produce alternate concrete inputs.

Vitaly

Weiqi Wang

unread,
Jul 30, 2021, 3:20:04 PMJul 30
to S2E Developer Forum
Hi Vitaly,

Thanks. Perhaps I can workaround it by hooking onSymbolicAddress to record symbolic pointer constraints while let s2e execute the program with concretized symbolic pointers. I'm looking at the source code and got a little confused. 
  1. The configuration --fork-on-symbolic-address=false is passed to g_s2e_fork_on_symbolic_address at https://github.com/S2E/s2e/blob/0be5c89df54b534025ea078488889ccbf20d6344/libs2ecore/src/S2EExecutor.cpp#L431
  2. Then libcpu/src/softmmu_header.h calls INSTR_FORK_AND_CONCRETIZE(val, max) at https://github.com/S2E/s2e/blob/0be5c89df54b534025ea078488889ccbf20d6344/libcpu/src/softmmu_header.h#L106
  3. INSTR_FORK_AND_CONCRETIZE(val, max) calls tcg_llvm_fork_and_concretize at https://github.com/S2E/s2e/blob/0be5c89df54b534025ea078488889ccbf20d6344/libcpu/src/softmmu_header.h#L85
  4. tcg_llvm_fork_and_concretize is defined as handleForkAndConcretize https://github.com/S2E/s2e/blob/0be5c89df54b534025ea078488889ccbf20d6344/libs2ecore/src/FunctionHandlers.cpp#L336
I checked klee::ref<klee::Expr> address = args[0] in https://github.com/S2E/s2e/blob/0be5c89df54b534025ea078488889ccbf20d6344/libs2ecore/src/FunctionHandlers.cpp#L56. With --fork-on-symbolic-address being true and false, the address contains different expression. Besides, doConcretize is always false.

fork-on-symbolic-address=true: onSymbolicAddress at 0x55fa5a4679e3 (reason 0): 0x55fa5a66a063 (Add w64 0x55fa5a66a000
          (ZExt w64 (Read w8 0x0 v0_input_0)))

fork-on-symbolic-address=false: onSymbolicAddress at 0x55b9a68ee9e3 (reason 0): 0x55b9a6af1 (LShr w64 (Add w64 0x55b9a6af1000
                    (ZExt w64 (Read w8 0x0 v0_input_0)))
           0xc)

Has concretization already happend somewhere before handleForkAndConcretize?

Vitaly Chipounov

unread,
Jul 30, 2021, 5:49:17 PMJul 30
to s2e...@googlegroups.com, Weiqi Wang

Hi,

You should see two calls to forkAndConcretize in most cases [1]: one for the whole address and another one for the upper 20 bits. Those must always be concrete, as they are used as an index for the TLB and walking page tables. S2E only allows the lower 12 bits to be symbolic. That's why you see the right shift by 12 bits in the fork-on-symbolic-address=false case. You don't see it in the true case, because the address was fully concretized by the first call, so there is nothing symbolic left.

Regarding doConcretize, is should be set by plugins that hook the onSymbolicAddress event.

Vitaly

[1] https://github.com/S2E/s2e/blob/0be5c89df54b534025ea078488889ccbf20d6344/libcpu/src/softmmu_header.h#L160

Weiqi Wang

unread,
Aug 9, 2021, 10:04:17 AMAug 9
to S2E Developer Forum
Hi Vitaly,

Thanks for the explanation. It's very helpful!

About your previous message, It's not clear to me why s2e can segfault due to complex symbolic pointer constraints breaking the solver. I run s2e with fork disabled. Why does it still needs to query the solver? 

    Symbolic pointers in S2E are not ready for real-world programs yet, that's why they are disabled by default. Symbolic pointers make it much harder to solve constraints.

    S2E however, does not know anything about this, so it has to send entire memory pages instead. This unfortunately breaks the solver.

Reply all
Reply to author
Forward
0 new messages