In general, a large number of aligned/disaligned regions is not
uncommon, especially for complex programs (our traces for Adobe Reader
had as many as ~1000 disaligned regions, and they were much smaller
than your 4GB traces). In such cases, additional techniques are needed
to identify the regions that are of interest for your particular
analysis.
Why are there so many? In addition to the execution differences caused
by the input differences (which is what I presume you are most
interested in), there are other factors increase the number of aligned/
disaligned regions: non-determinism, scheduling differences/
interactions from the other threads, etc. The exact effect of these
factors will vary for each program; to get an idea for how many of the
3800 regions can be accounted to these in your particular case, try
collecting two different traces using identical inputs and align them.
How many disaligned regions are there?
The warning about nothing being pushed onto the stack indicates the
presence of certain types of unstructured control flow, for example
due to interrupt instructions, setjmp/longjmp calls, rep instructions,
etc. These cases are handled by the tracealign tool so it is probably
not affecting the correctness of the results, but you can look into
each case (the warning message prints the instruction in question) if
you have concerns.
-Noah