Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Strange structure initialization problem.

118 views
Skip to first unread message

DSF

unread,
Feb 8, 2015, 11:13:50 PM2/8/15
to
Hello.

The following code has me completely baffled.

In a header file:

typedef struct tag_IOERRORS
{
DWORD ioerror;
DWORD syserror;
} IOERRORS;

For our purposes, consider DWORD to be an unsigned 32-bit value.

The .C file (Yes, it's a C file, I'll explain why I'm posting here
soon):

IOERRORS GetVolumeInfoW(const wchar_t *path, VOLUMEINFOW *vi)
{
IOERRORS ret = {0, 0};
...
return ret;
}
Sets the struct's members used for the return value to 0.

The problem is it only does this the first time GetVolumeInfo is
executed. Subsequent calls do not set ret's members to 0. Reading
them produces values with no correlation to any valid errors at all.

Now why I'm in this group.

Debugging this code should produce something like:
(ret is at [ebp-4] and [ebp-8].)
mov [ebp-4], 0
mov [ret-8], 0

What's compiled is:

mov eax, [FBaseString<wchar_t>::blank + 0x30]
mov [ebp-0x08], eax
mov eax, [0x42A698]
mov [ebp-0x04], eax

This is wrong on multiple levels.

1. The compiler doesn't use immediate 0s, but two memory locations.
2. The memory locations are referenced in entirely different ways.
3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
string template class static data member and doesn't even belong in a
C source file. (The C file *is* being compiled my the C compiler.)
The only reason this C file is part of the project and not a library
call is so that I can debug it properly.
4. I have no idea what/where [0x42a698] refers to.

ret is on the call stack and not the local stack because the
compiler handles returns bigger than 32-bits by pushing a pointer to
of the returned variable on the stack before the call and storing data
in the return variable via the pointer.

Any ideas why this is happening?

DSF
"'Later' is the beginning of what's not to be."
D.S. Fiscus

Melzzzzz

unread,
Feb 8, 2015, 11:19:24 PM2/8/15
to
Show your code. Seems that you initialize ret with these two values...

Ian Collins

unread,
Feb 8, 2015, 11:53:40 PM2/8/15
to
DSF wrote:
> Hello.
>
> The following code has me completely baffled.
>
> In a header file:
>
> typedef struct tag_IOERRORS
> {
> DWORD ioerror;
> DWORD syserror;
> } IOERRORS;


Maybe the compiler is confused by all caps types :)

<snip>

> This is wrong on multiple levels.

0) You are looking at debugger output within a C++ application?

> 1. The compiler doesn't use immediate 0s, but two memory locations.
> 2. The memory locations are referenced in entirely different ways.
> 3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
> string template class static data member and doesn't even belong in a
> C source file. (The C file *is* being compiled my the C compiler.)
> The only reason this C file is part of the project and not a library
> call is so that I can debug it properly.

Eh? Why can't you debug it in a library?

You need to post more code, something self contained that shows the problem.

--
Ian Collins

DSF

unread,
Feb 9, 2015, 12:57:53 AM2/9/15
to
On Mon, 09 Feb 2015 17:53:29 +1300, Ian Collins <ian-...@hotmail.com>
wrote:

>DSF wrote:
>> Hello.
>>
>> The following code has me completely baffled.
>>
>> In a header file:
>>
>> typedef struct tag_IOERRORS
>> {
>> DWORD ioerror;
>> DWORD syserror;
>> } IOERRORS;
>
>
>Maybe the compiler is confused by all caps types :)

One of the few Windows styles I've adopted. (I hate the hungarian
notation in Windows because it goes against my coding style.)
>
><snip>
>
>> This is wrong on multiple levels.
>
>0) You are looking at debugger output within a C++ application?

I am looking at an assembly-level view of the source code in my IDE.


>> 1. The compiler doesn't use immediate 0s, but two memory locations.
>> 2. The memory locations are referenced in entirely different ways.
>> 3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
>> string template class static data member and doesn't even belong in a
>> C source file. (The C file *is* being compiled my the C compiler.)
>> The only reason this C file is part of the project and not a library
>> call is so that I can debug it properly.
>
>Eh? Why can't you debug it in a library?

Explained in my updated post to my post.

>
>You need to post more code, something self contained that shows the problem.

See previous reply.

DSF

unread,
Feb 9, 2015, 1:00:27 AM2/9/15
to
I do initialize ret. To zero and zero. See my updated reply to my
post for more.

Ian Collins

unread,
Feb 9, 2015, 1:26:54 AM2/9/15
to
DSF wrote:
>
> Explained in my updated post to my post.

Which updated post?

--
Ian Collins

David Brown

unread,
Feb 9, 2015, 3:02:54 AM2/9/15
to
On 09/02/15 05:13, DSF wrote:
> Hello.
>
> The following code has me completely baffled.
>
> In a header file:
>
> typedef struct tag_IOERRORS
> {
> DWORD ioerror;
> DWORD syserror;
> } IOERRORS;
>
> For our purposes, consider DWORD to be an unsigned 32-bit value.
>
> The .C file (Yes, it's a C file, I'll explain why I'm posting here
> soon):
>

A ".C" file is normally treated as a C++ file - if you want to compile
as plain C, use ".c". Your compiler may use non-standard conventions,
but although you are asking about specific code generation you haven't
told us what compiler you are using.

(Of course using ".C" for C++ is a silly idea - it is much clearer to
use ".cpp", but to be safe always use lower-case ".c" for C files.)

You said you made another post with more information, but I think it got
lost somewhere on the way to the newsserver. Try posting again, with
compilable code demonstrating the problem, and information about the
compiler you are using.


Lőrinczy Zsigmond

unread,
Feb 9, 2015, 5:53:03 AM2/9/15
to
Is it just me, or did you really not specify the exact platform and
compiler?

Can you compiler generate Assembly listing (like gcc -S)? That would be
interesting to compare with the debugger-disassembly.

Scott Lurndal

unread,
Feb 9, 2015, 9:38:19 AM2/9/15
to
DSF <nota...@address.here> writes:
>Hello.
>
> The following code has me completely baffled.
>
> In a header file:
>
>typedef struct tag_IOERRORS
>{
> DWORD ioerror;
> DWORD syserror;
>} IOERRORS;
>
> For our purposes, consider DWORD to be an unsigned 32-bit value.
>
> The .C file (Yes, it's a C file, I'll explain why I'm posting here
>soon):
>
>IOERRORS GetVolumeInfoW(const wchar_t *path, VOLUMEINFOW *vi)
>{
> IOERRORS ret = {0, 0};
>...
> return ret;
>}
> Sets the struct's members used for the return value to 0.
>
> The problem is it only does this the first time GetVolumeInfo is
>executed. Subsequent calls do not set ret's members to 0. Reading
>them produces values with no correlation to any valid errors at all.
>
> Now why I'm in this group.
>
> Debugging this code should produce something like:
> (ret is at [ebp-4] and [ebp-8].)
> mov [ebp-4], 0
> mov [ret-8], 0

Canonically (and optimally), the compiler should
generate (in at&t syntax):

xor %eax, %eax
mov %eax, 4(%ebp)
mov %eax, 8(%ebp)

>
> What's compiled is:
>
> mov eax, [FBaseString<wchar_t>::blank + 0x30]
> mov [ebp-0x08], eax
> mov eax, [0x42A698]
> mov [ebp-0x04], eax
>
> This is wrong on multiple levels.
>
> 1. The compiler doesn't use immediate 0s, but two memory locations.

Using the xor (a 1-byte insn), above, is more efficient than including 4 bytes of
zeros in the instruction stream twice.

> 2. The memory locations are referenced in entirely different ways.
> 3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
>string template class static data member and doesn't even belong in a

You didn't mention what compiler you're using, I'll assume some
version of a microsoft compiler from your use of intel assembler
syntax instead of at&t assembler syntax.

In any case, the linker may place a relocatable read-only constant
anywhere in the codefile where the memory attributes (e.g. RO) match.
As there is no associated symbol, the disassembler will use the closest
symbol with a smaller address and add an offset. If the offset it
too large (e.g. for address 0x42a698), the disassembler will simply
show the address.

DSF

unread,
Feb 9, 2015, 4:08:04 PM2/9/15
to
On Mon, 09 Feb 2015 17:53:29 +1300, Ian Collins <ian-...@hotmail.com>
wrote:

I replied to this last night, but I don't see it in the group.
Apologies if a second shows up!

>DSF wrote:
>> Hello.
>>
>> The following code has me completely baffled.
>>
>> In a header file:
>>
>> typedef struct tag_IOERRORS
>> {
>> DWORD ioerror;
>> DWORD syserror;
>> } IOERRORS;
>
>
>Maybe the compiler is confused by all caps types :)

That's one of the few styles I adopted from M$, typdef'd structs
being in all upper case. I can't stand Hungarian Notation, however.

><snip>
>
>> This is wrong on multiple levels.
>
>0) You are looking at debugger output within a C++ application?
>
>> 1. The compiler doesn't use immediate 0s, but two memory locations.
>> 2. The memory locations are referenced in entirely different ways.
>> 3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
>> string template class static data member and doesn't even belong in a
>> C source file. (The C file *is* being compiled my the C compiler.)
>> The only reason this C file is part of the project and not a library
>> call is so that I can debug it properly.
>
>Eh? Why can't you debug it in a library?
>
>You need to post more code, something self contained that shows the problem.

More in my reply to my original post. Which was supposed to be
posted last night, but was delayed.

DSF

unread,
Feb 9, 2015, 4:27:29 PM2/9/15
to
On Mon, 09 Feb 2015 09:02:44 +0100, David Brown
<david...@hesbynett.no> wrote:

>On 09/02/15 05:13, DSF wrote:
>> Hello.
>>
>> The following code has me completely baffled.
>>
>> In a header file:
>>
>> typedef struct tag_IOERRORS
>> {
>> DWORD ioerror;
>> DWORD syserror;
>> } IOERRORS;
>>
>> For our purposes, consider DWORD to be an unsigned 32-bit value.
>>
>> The .C file (Yes, it's a C file, I'll explain why I'm posting here
>> soon):
>>
>
>A ".C" file is normally treated as a C++ file - if you want to compile
>as plain C, use ".c". Your compiler may use non-standard conventions,
>but although you are asking about specific code generation you haven't
>told us what compiler you are using.

I'm working in Windows NTFS case-insensitive file system. And even
so the compiler is always mangling the case of my names. The example,
"GetVolumeInfo.c" was saved as shown. On the disk it is now
"getvolumeinfo.c". Sometimes it does this, sometimes it leaves the
case alone.

Also, for any item in the project list you can look up (and also
change) what is used to process the file. For "GetVolumeInfo.c" it's
CCompile, not CPPCompile.

And once again, I must admit I'm using Borland C/C++ 5.01A from the
mid '90s. I don't have time right now to learn the operation of a new
compiler, let alone the probability of having to alter every piece of
source code I've written.

>(Of course using ".C" for C++ is a silly idea - it is much clearer to
>use ".cpp", but to be safe always use lower-case ".c" for C files.)

(See above.)

>You said you made another post with more information, but I think it got
>lost somewhere on the way to the newsserver. Try posting again, with
>compilable code demonstrating the problem, and information about the
>compiler you are using.

I got caught up in trying various permutations of the code and
discovering it did something else unexpected. It will be posted soon
after this.

DSF

unread,
Feb 9, 2015, 4:50:01 PM2/9/15
to
On Mon, 09 Feb 2015 14:38:08 GMT, sc...@slp53.sl.home (Scott Lurndal)
wrote:
Yes. I realize I was on the wrong side of ebp. My mistake. I
usually don't use a stack frame in my assembly code, or if I do the
assembler generates the epb offsets automatically. Also I didn't know
at the time that they made a local copy of ret instead of using the
pointer to the LHS variable that's on the stack.

The optimization guide for my AMD says that
mov [ebp+4], 0
is faster than
mov [ebp+4], eax .

>>
>> What's compiled is:
>>
>> mov eax, [FBaseString<wchar_t>::blank + 0x30]
>> mov [ebp-0x08], eax
>> mov eax, [0x42A698]
>> mov [ebp-0x04], eax
>>
>> This is wrong on multiple levels.
>>
>> 1. The compiler doesn't use immediate 0s, but two memory locations.
>
>Using the xor (a 1-byte insn), above, is more efficient than including 4 bytes of
>zeros in the instruction stream twice.
(See above.)
>
>> 2. The memory locations are referenced in entirely different ways.
>> 3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
>>string template class static data member and doesn't even belong in a
>
>You didn't mention what compiler you're using, I'll assume some
>version of a microsoft compiler from your use of intel assembler
>syntax instead of at&t assembler syntax.

And once again, I must admit I'm using Borland C/C++ 5.01A from the
mid '90s. I don't have time right now to learn the operation of a new
compiler, let alone the probability of having to alter every piece of
source code I've written.

>In any case, the linker may place a relocatable read-only constant
>anywhere in the codefile where the memory attributes (e.g. RO) match.
>As there is no associated symbol, the disassembler will use the closest
>symbol with a smaller address and add an offset. If the offset it
>too large (e.g. for address 0x42a698), the disassembler will simply
>show the address.

The point being it shouldn't be using memory at all. There are two
immediate value zeros there. I can come up with no reason to replace
them with memory. Worse, although the compilation from the library
version specifies the address of the memory location and the content
never changes from zero, In the C compilation of the CPP compiler the
two strange addresses *DO* change! On the second call to
GetVolumeInfo, they contain non-zero values. And since GetVolumeInfo
only changes ret if there is an error, this causes a false failure
down the line. (Which is how this all started.)

Scott Lurndal

unread,
Feb 9, 2015, 5:01:28 PM2/9/15
to
DSF <nota...@address.here> writes:
>On Mon, 09 Feb 2015 14:38:08 GMT, sc...@slp53.sl.home (Scott Lurndal)
>wrote:

>>
>>Canonically (and optimally), the compiler should
>>generate (in at&t syntax):
>>
>> xor %eax, %eax
>> mov %eax, 4(%ebp)
>> mov %eax, 8(%ebp)
>
> Yes. I realize I was on the wrong side of ebp. My mistake. I
>usually don't use a stack frame in my assembly code, or if I do the
>assembler generates the epb offsets automatically. Also I didn't know
>at the time that they made a local copy of ret instead of using the
>pointer to the LHS variable that's on the stack.
>
> The optimization guide for my AMD says that
> mov [ebp+4], 0
>is faster than
> mov [ebp+4], eax .

Can you provide a reference? I can't find that in 40546_3_07.PDF.

I can see 'movzbl #0, 4(%ebp)', as that will include a
one-byte constant in the instruction stream, keeping the
icache footprint managable and reducing presure on the
32-bit register space. However, in long mode (64-bit),
with 8 more gp registers, using RAX as I show above is
preferred.

>

>
>>In any case, the linker may place a relocatable read-only constant
>>anywhere in the codefile where the memory attributes (e.g. RO) match.
>>As there is no associated symbol, the disassembler will use the closest
>>symbol with a smaller address and add an offset. If the offset it
>>too large (e.g. for address 0x42a698), the disassembler will simply
>>show the address.
>
> The point being it shouldn't be using memory at all. There are two

Twenty years of optimization enhancements to compilers are
missing from your Borland compiler, it appears.

Ian Collins

unread,
Feb 9, 2015, 5:08:07 PM2/9/15
to
Maybe you now have a good reason to invest time in becoming familiar
with a modern compiler?

It's not only the language that has come a long way in 20 years, modern
compiler code generation is unrecognisable compared to ancient
compilers. Your 90s compiler will have no support for current
processors and extensions.

--
Ian Collins

DSF

unread,
Feb 9, 2015, 5:24:08 PM2/9/15
to
On Sun, 08 Feb 2015 23:13:34 -0500, DSF <nota...@address.here>
wrote:

Hello.
>
> The following code has me completely baffled.
>
> In a header file:
>
>typedef struct tag_IOERRORS
>{
> DWORD ioerror;
> DWORD syserror;
>} IOERRORS;
>
> For our purposes, consider DWORD to be an unsigned 32-bit value.
>
> The .C file (Yes, it's a C file, I'll explain why I'm posting here
>soon):
>
>IOERRORS GetVolumeInfoW(const wchar_t *path, VOLUMEINFOW *vi)
>{
> IOERRORS ret = {0, 0};
>...
> return ret;
>}
> Sets the struct's members used for the return value to 0.
>
> The problem is it only does this the first time GetVolumeInfo is
>executed. Subsequent calls do not set ret's members to 0. Reading
>them produces values with no correlation to any valid errors at all.
>

Sorry for being late with this reply, but I've been doing further
testing.

First of all:

And once again, I must admit I'm using Borland C/C++ 5.01A from the
mid '90s. I don't have time right now to learn the operation of a new
compiler, let alone the probability of having to alter every piece of
source code I've written.


Update.

The reason The C file was in the project is sometimes I get a
messagebox when stepping into a library function telling me the source
is not available and the debugging continues in uncommented assembly
code. One time (thankfully only one time) the compiler redisplayed
the messagebox between *each step*. (AARRRGGGHHH!)

I removed the C file from the project and was able to follow the
source as compiled by the library project. The following represents
the library compilation.

DWORD ret = {0, 0};

retrieves the two zeros from two contiguous 32-bit sections (DWORDs)
of memory and places them in the two local members of ret. Note that
GetVolumeInfo never writes back to these two DWORDs.

When my compiler needs to return a value larger than 32 bits, it
pushes the address of the LHS variable onto the stack just before the
call. Normally, this pointer to the LHS variable is used in lieu of
ret. But in this case, a local ret is created and initialized as
above. When return ret is executed, the local copy of ret is copied
to the address of the LHS variable.

I am used to seeing the LHS variable pointer being used directly.
The only reason I could see to use a temporary is if the function had
multiple returns with different IOERRORS-style variables. (It does
not.)

I have no idea why it's taking the contents of two memory locations
containing zero instead of using immediate values. The net effect is
whatever is in those two memory locations initializes ret. Note again
that GetVolumeInfo does not write back to these locations.

In the library version, the two memory locations are not altered
between calls. In the extremely weird results of compiling with the
C++ project, the contents of the two memory locations *do* change and
so ret is initialized to whatever they happen to contain. And what
they happen to contain are two non-zero values which causes a false
failure down the line.

Assembly for stand alone compilation: (library)

_DATA segment dword public use32 'DATA'
$aebkenaa label byte
dd 0
dd 0
_DATA ends

...

; IOERRORS ret = {0, 0};
;
?live1@160: ; EBX = vi
@6:
mov eax,dword ptr [$aebkenaa]
mov dword ptr [ebp-8],eax
mov eax,dword ptr [$aebkenaa+4]
mov dword ptr [ebp-4],eax

...

; return ret;
;
?live1@592: ;
@8:
mov edx,dword ptr [ebp+8] ; LHS ptr
mov ecx,dword ptr [ebp-8] ; Local ret
mov dword ptr [edx],ecx ; Local 2 LHS
mov ecx,dword ptr [ebp-4] ; Same for
mov dword ptr [edx+4],ecx ; other struct
mov eax,dword ptr [ebp+8] ; member
;
; }

So three mysteries:

1. Why use zero'd memory to initialize ret instead of zeros?
(There are two other functions in GetVolumeInfo.C that use ret
initialized as explained and they all have their own separate memory
locations to initialize the local ret.)

2. Why use a temp for ret instead of using it's pointer?

3. Why would the compiler change the initialization addresses when
this C file is compiled with the C compiler in a C++ project? (As
opposed to the library, which is a C project.)

(Scott Lurndal has probably answered this one, although in the CPP
project, the memory locations are anything but "read only constants.")

I can fix the problem by never including GetVolumeInfo.c in a CPP
project and only use the library version. But that doesn't answer the
questions of initialization and doesn't guarantee this won't happen to
another C file in a CPP project. Sometimes I find bugs in library
code and include C source files into the CPP project where the bug was
discovered for debugging. Once fixed, they are recompiled back into
the library and removed from the CPP project.

Thanks for any help.

Ian Collins

unread,
Feb 9, 2015, 5:39:09 PM2/9/15
to
DSF wrote:
>
> I have no idea why it's taking the contents of two memory locations
> containing zero instead of using immediate values. The net effect is
> whatever is in those two memory locations initializes ret. Note again
> that GetVolumeInfo does not write back to these locations.
>
> In the library version, the two memory locations are not altered
> between calls. In the extremely weird results of compiling with the
> C++ project, the contents of the two memory locations *do* change and
> so ret is initialized to whatever they happen to contain. And what
> they happen to contain are two non-zero values which causes a false
> failure down the line.

You undoubtedly have either 1) a compiler bug or 2) a memory corrupting
bug in your code.

Case 2) would appear more likely, so either change your compile options
to use read only literals or set a break-point on writes to those
locations. If you can't do both of those, that's yet another reason to
upgrade!

--
Ian Collins

DSF

unread,
Feb 10, 2015, 12:56:32 AM2/10/15
to
On Sun, 08 Feb 2015 23:13:34 -0500, DSF <nota...@address.here>
wrote:

Hello.

Info below.
The compiler treats initialization of a structure the same way it
treats initialization of a character array. As in:

void Bar(void)
{
char foo[] = {"This is foo!"};
...

In this case, the text is not constant, but is expected to contain
"This is foo!" after the above line every time Bar is run. Most
compilers, I imagine, will store the literal text string in a static
area and copy it to local "foo" memory at the start of Bar.

The code:

UINT array[20] = {1, 2, 3, 4};

Produces:
_DATA segment dword public use32 'DATA'
align 4
$mkbknmaa label dword
dd 1
dd 2
dd 3
dd 4
db 64 dup(?)
_DATA ends
...
; UINT array[20] = {1, 2, 3, 4};
mov esi,offset $mkbknmaa
lea edi,dword ptr [ebp-116]
mov ecx,20
rep movsd

I believe this is what my compiler is automatically doing with any
structure or array initialization. It stores the literal numeric
values in a static area and then copies them into the structure. If I
change the original to:

IOERRORS ret;

ret.ioerror = 0;
ret.syserror = 0;

It becomes:
; IOERRORS ret; // = {0, 0};
; ret.ioerror = 0;
@6:
xor eax,eax
mov dword ptr [ebp-8],eax
;
; ret.syserror = 0;
xor edx,edx
mov dword ptr [ebp-4],edx

Still stupid code. And it still creates/uses a local ret and copies
it to the LHS pointer on the stack at the end, even though there is
only one exit point. But at least I understand now; it's a
boilerplate for initialization.

> 2. The memory locations are referenced in entirely different ways.

That, as has been mentioned, is in linker territory.

> 3. [FBaseString<wchar_t>::blank + 0x30] refers to an offset from a
>string template class static data member and doesn't even belong in a
>C source file. (The C file *is* being compiled my the C compiler.)
>The only reason this C file is part of the project and not a library
>call is so that I can debug it properly.

It turns out that FBaseString<wchar_t>::blank + 0x30 resolves to...

...wait for it...


Address 0x42a694! Which explains:
> 4. I have no idea what/where [0x42a698] refers to.
It's the address following FBaseString<wchar_t>::blank + 0x30!

So there we have our static storage of two unsigned integer zeros.

Thanks to Ian Collins for putting me onto setting a breakpoint on
write, which forced me to track down the address of
FBaseString<wchar_t>::blank + 0x30. Provided by going into a function
of said template and placing a watch on &blank. Then setting a write
breakpoint for 8 bytes at address 0x42a694.

As to whether it's a compiler bug or an error of mine is yet to be
determined. But address 0x42a694 lies right in the middle of a buffer
of FBaseString and is overwritten near the end of the program loop
that this code is within. Thus causing all subsequent loop passes to
fail because GetVolumeInfo doesn't return 0, 0.

I've also learned (at least while I'm using this compiler) to avoid
multiple exit points if I'm returning a structure. As a test I added
three return ret; statements to GetVolumeInfo. Each one produced:

mov eax,dword ptr [ebp+8]
mov edx,dword ptr [ebp-8]
mov dword ptr [eax],edx
mov edx,dword ptr [ebp-4]
mov dword ptr [eax+4],edx
mov eax,dword ptr [ebp+8]
jmp @12
Copying the *same* local variable to the stack for return. They may
use different registers to do it, but it's an exit point, negating any
later requirements on the register values.

The last one even has a convenient label. So instead of repeating
the 17-byte sequence each time, they should have dumped every mov
above and changed the last to jmp @8!

@8:
mov ecx,dword ptr [ebp+8]
mov eax,dword ptr [ebp-8]
mov dword ptr [ecx],eax
mov eax,dword ptr [ebp-4]
mov dword ptr [ecx+4],eax
mov eax,dword ptr [ebp+8]
@12:
pop edi
pop esi
pop ebx
mov esp,ebp
pop ebp
ret

Enough of this off topic typing. At least I think I know why some
of the strange code is implemented the way it is. And I know where
the error is. FBaseString<wchar_t>::blank is a const static value, so
it's reasonable that 0x30 farther on (0x42a694) is still in the static
area. The memory allocated for the string starts at 0x42a664. I
don't have an idea off the top of my head how to determine if it's a
bug or corrupt memory data. 42a664 could be a segment of a string,
indicating a string has written over memory manager addresses, except
this is a 16-bit Unicode project and I don't believe there are any
ASCII strings. This will be the next challenge, as initializing ret
members individually (thus avoiding the address conflict) is only
putting a band-aid on a problem that's sure to arise when I least
expect it and byte me in the ass!

Thanks for all your advice and patience!

Marcel Mueller

unread,
Feb 10, 2015, 3:21:30 AM2/10/15
to
On 09.02.15 23.38, Ian Collins wrote:
> You undoubtedly have either 1) a compiler bug or 2) a memory corrupting
> bug in your code.
>
> Case 2) would appear more likely,

Normally I would fully agree. But he is using /Borland/ C. That make 2)
more probably, so maybe it can beat 1).

I have really seen wired things with Borland C/C++ compilers. E.g. a
value is assigned to register X and to read from register Y a few
instructions later, of course without a matching MOV in between. Most of
these Bugs were related to some optimization option, but not all of
them. And it was not restricted to one compiler version or platform. I
had this kind of problems with BCOS2 as well as with different Windows
Versions.

> so either change your compile options
> to use read only literals or set a break-point on writes to those
> locations. If you can't do both of those, that's yet another reason to
> upgrade!

Fortunately even very old Borland debuggers can use hardware data
points. So observing memory should not be a serious problem.


Marcel

David Brown

unread,
Feb 10, 2015, 3:54:32 AM2/10/15
to
On 09/02/15 22:27, DSF wrote:
> On Mon, 09 Feb 2015 09:02:44 +0100, David Brown
> <david...@hesbynett.no> wrote:
>
>> On 09/02/15 05:13, DSF wrote:
>>> Hello.
>>>
>>> The following code has me completely baffled.
>>>
>>> In a header file:
>>>
>>> typedef struct tag_IOERRORS
>>> {
>>> DWORD ioerror;
>>> DWORD syserror;
>>> } IOERRORS;
>>>
>>> For our purposes, consider DWORD to be an unsigned 32-bit value.
>>>
>>> The .C file (Yes, it's a C file, I'll explain why I'm posting here
>>> soon):
>>>
>>
>> A ".C" file is normally treated as a C++ file - if you want to compile
>> as plain C, use ".c". Your compiler may use non-standard conventions,
>> but although you are asking about specific code generation you haven't
>> told us what compiler you are using.
>
> I'm working in Windows NTFS case-insensitive file system. And even
> so the compiler is always mangling the case of my names. The example,
> "GetVolumeInfo.c" was saved as shown. On the disk it is now
> "getvolumeinfo.c". Sometimes it does this, sometimes it leaves the
> case alone.
>

NTFS does not mangle case - it is a case-preserving filesystem. If you
save a file as "GetVolumeInfo.c", that's what you get on the disk.

Of course, applications (such as a compiler IDE) are free to screw up
filenames as much as they like. For example, rather than simply
overwriting an old file of similar name (same letters, different cases),
the IDE /could/ do so by opening the old file, writing the new contents,
and closing it again - then your newly saved file will (I think) take
the case of the old file.

> Also, for any item in the project list you can look up (and also
> change) what is used to process the file. For "GetVolumeInfo.c" it's
> CCompile, not CPPCompile.

Okay, so this is not an issue here. But aim for consistency and /try/
to save your C files as ".c". Maybe one day you will be using a serious
compiler and working on a serious OS.

>
> And once again, I must admit I'm using Borland C/C++ 5.01A from the
> mid '90s. I don't have time right now to learn the operation of a new
> compiler, let alone the probability of having to alter every piece of
> source code I've written.

Yet you have time to deal with the endless problems you have with such
an old tool? If you are waiting until you have a few weeks to spare,
you will wait forever - changing to a newer and better development tool
is an investment that will save you time in the future.

(I am not a fan of upgrading just for the sake of getting the latest and
greatest, and I fully understand the benefits of sticking to an old but
known tool - but there are limits!)

>
>> (Of course using ".C" for C++ is a silly idea - it is much clearer to
>> use ".cpp", but to be safe always use lower-case ".c" for C files.)
>
> (See above.)
>
>> You said you made another post with more information, but I think it got
>> lost somewhere on the way to the newsserver. Try posting again, with
>> compilable code demonstrating the problem, and information about the
>> compiler you are using.
>
> I got caught up in trying various permutations of the code and
> discovering it did something else unexpected. It will be posted soon
> after this.

Fair enough. There is a good chance that you will figure out the
problem yourself in this process - the exercise of "finding a minimal
compilable sample that demonstrates the problem" is as important to the
person with the problem as to those trying to help.

Marcel Mueller

unread,
Feb 10, 2015, 4:11:52 AM2/10/15
to
On 10.02.15 06.56, DSF wrote:
[init local copy]
> I believe this is what my compiler is automatically doing with any
> structure or array initialization. It stores the literal numeric
> values in a static area and then copies them into the structure.

Exactly.

> If I change the original to:
>
> IOERRORS ret;
>
> ret.ioerror = 0;
> ret.syserror = 0;
>
> It becomes:
> ; IOERRORS ret; // = {0, 0};
> ; ret.ioerror = 0;
> @6:
> xor eax,eax
> mov dword ptr [ebp-8],eax
> ;
> ; ret.syserror = 0;
> xor edx,edx
> mov dword ptr [ebp-4],edx
>
> Still stupid code.

Turn global register optimizations on (or however this was called at
Borland) and it will likely look more pretty. But your debugger will
dislike the result.

> And it still creates/uses a local ret and copies
> it to the LHS pointer on the stack at the end, even though there is
> only one exit point.

The code in between usually need the registers for something else or
calls functions that do not guarantee to preserver their values.

Furthermore the debugger cannot access variable values that have no
memory representation. So using a register is not an option as long as
you have debugging enabled.

> But at least I understand now; it's a
> boilerplate for initialization.
>
>> 2. The memory locations are referenced in entirely different ways.
>
> That, as has been mentioned, is in linker territory.

It is up to the implementation to use one or another method.
Independently of your code.

However, there should also be an option to place compile time constants
in the code segment. This is not that natural.
Firstly because on some platforms read and execute access requires
different permissions. Think of the NX feature, although x86 learned
this quite lately.
Secondly in old C character constants like your "This is foo!" are of
type char* rather than const char*. This has the effect that when is is
passed to a function as char* argument the function is allowed to change
the value of this constant. That is the reason why they are normally
placed in the DATA segment rather than TEXT. This behavior is no longer
valid, but your compiler might still be compatible to that.


> It turns out that FBaseString<wchar_t>::blank + 0x30 resolves to...
>
> ...wait for it...
>
> Address 0x42a694! Which explains:
>> 4. I have no idea what/where [0x42a698] refers to.
> It's the address following FBaseString<wchar_t>::blank + 0x30!
>
> So there we have our static storage of two unsigned integer zeros.

Probably. The compiler simply did not create a debugger symbol for its
internal constant. And the debugger uses the next best symbol with an
offset. This is nothing where the linker is involved. It is simple the
first symbol in the data segment of your compilation unit (.obj) and the
debugger uses the last symbol from the previous compilation unit,
unaware of compilation units.

Activate the assembler output of the compiler and you will see a
reasonable reference.


> As to whether it's a compiler bug or an error of mine is yet to be
> determined. But address 0x42a694 lies right in the middle of a buffer
> of FBaseString and is overwritten near the end of the program loop
> that this code is within.

Maybe you called free on memory not allocated before and the compiler
reused your memory.

Normally I would recommend to run your program with a memory analyzer
like valgrind. But with that old platform you may not have any option
like this.


> I've also learned (at least while I'm using this compiler) to avoid
> multiple exit points if I'm returning a structure. As a test I added
> three return ret; statements to GetVolumeInfo. Each one produced:
>
> mov eax,dword ptr [ebp+8]
> mov edx,dword ptr [ebp-8]
> mov dword ptr [eax],edx
> mov edx,dword ptr [ebp-4]
> mov dword ptr [eax+4],edx
> mov eax,dword ptr [ebp+8]
> jmp @12
> Copying the *same* local variable to the stack for return. They may
> use different registers to do it, but it's an exit point, negating any
> later requirements on the register values.

- Turn on optimizations. For debugging purposes the compiler can neither
share nor interleave code between different source lines.
- Use a recent compiler. Many things happened in between.


> The last one even has a convenient label. So instead of repeating
> the 17-byte sequence each time, they should have dumped every mov
> above and changed the last to jmp @8!

This is the common sub expression optimization. It has to be turned on
and the code has to fit into the analysis window.


> Enough of this off topic typing. At least I think I know why some
> of the strange code is implemented the way it is. And I know where
> the error is. FBaseString<wchar_t>::blank is a const static value, so
> it's reasonable that 0x30 farther on (0x42a694) is still in the static
> area. The memory allocated for the string starts at 0x42a664. I
> don't have an idea off the top of my head how to determine if it's a
> bug or corrupt memory data.

I would guess some of your code has undefined behavior and the string
buffer never should point to that area. I have no idea what FBaseString
is and whether it allows to assign a buffer from outside. This might
explain everything.
Maybe you set some string class instance to blank, retrieved its address
as C compatible char* and then used this as strcpy target. Bad idea!
I'm just guessing, of course.

Again, turn on the option to place constants in the TEXT segment and you
will get a CPU exception when an instruction wants to write to the
constant. This won't prevent you from from doing other wired things with
pointers, but it will catch at least a few cases.

As rule of thumb: do not use the type char* anywhere in your C++ code.
Use your string class or const char* only. This will prevent you from
many problems.
I did not take care of wchar_t in my post. Everything which applies to
char applies to wchar_t as well.


Marcel

DSF

unread,
Feb 11, 2015, 12:41:04 AM2/11/15
to
On Tue, 10 Feb 2015 09:54:14 +0100, David Brown
<david...@hesbynett.no> wrote:

>On 09/02/15 22:27, DSF wrote:
>> On Mon, 09 Feb 2015 09:02:44 +0100, David Brown
>> <david...@hesbynett.no> wrote:
>>

>>>
>>> A ".C" file is normally treated as a C++ file - if you want to compile
>>> as plain C, use ".c". Your compiler may use non-standard conventions,
>>> but although you are asking about specific code generation you haven't
>>> told us what compiler you are using.
>>
>> I'm working in Windows NTFS case-insensitive file system. And even
>> so the compiler is always mangling the case of my names. The example,
>> "GetVolumeInfo.c" was saved as shown. On the disk it is now
>> "getvolumeinfo.c". Sometimes it does this, sometimes it leaves the
>> case alone.
>>
>
>NTFS does not mangle case - it is a case-preserving filesystem. If you
>save a file as "GetVolumeInfo.c", that's what you get on the disk.

I was not blaming NTFS. I know it's the compiler that's doing the
case mangling.

>Of course, applications (such as a compiler IDE) are free to screw up
>filenames as much as they like. For example, rather than simply
>overwriting an old file of similar name (same letters, different cases),
>the IDE /could/ do so by opening the old file, writing the new contents,
>and closing it again - then your newly saved file will (I think) take
>the case of the old file.
>
>> Also, for any item in the project list you can look up (and also
>> change) what is used to process the file. For "GetVolumeInfo.c" it's
>> CCompile, not CPPCompile.
>
>Okay, so this is not an issue here. But aim for consistency and /try/
>to save your C files as ".c". Maybe one day you will be using a serious
>compiler and working on a serious OS.

I was only trying to capitalize C as the name of the language,
somehow I wound up typing .C as well. I don't normally (if ever) use
caps for extensions.

>>
>> And once again, I must admit I'm using Borland C/C++ 5.01A from the
>> mid '90s. I don't have time right now to learn the operation of a new
>> compiler, let alone the probability of having to alter every piece of
>> source code I've written.
>
>Yet you have time to deal with the endless problems you have with such
>an old tool? If you are waiting until you have a few weeks to spare,
>you will wait forever - changing to a newer and better development tool
>is an investment that will save you time in the future.
>
>(I am not a fan of upgrading just for the sake of getting the latest and
>greatest, and I fully understand the benefits of sticking to an old but
>known tool - but there are limits!)

Yes, I know. I don't consider getting used to a new compiler to be
as big a problem as the code rewriting and (sigh) subsequent bug
chasing. When I sit down at the computer to code (or almost anything
else), it's like I've stepped into a time machine stuck on fast
forward. I'll do something that felt like it took ten minutes only to
look at the clock and see two hours have passed. In other words: Few
weeks? Try several months!

>>
>>> (Of course using ".C" for C++ is a silly idea - it is much clearer to
>>> use ".cpp", but to be safe always use lower-case ".c" for C files.)
>>
>> (See above.)
>>
>>> You said you made another post with more information, but I think it got
>>> lost somewhere on the way to the newsserver. Try posting again, with
>>> compilable code demonstrating the problem, and information about the
>>> compiler you are using.
>>
>> I got caught up in trying various permutations of the code and
>> discovering it did something else unexpected. It will be posted soon
>> after this.
>
>Fair enough. There is a good chance that you will figure out the
>problem yourself in this process - the exercise of "finding a minimal
>compilable sample that demonstrates the problem" is as important to the
>person with the problem as to those trying to help.

I don't even remember what my last paragraph was referring to.
(Another problem impeding coding.) I did at least figure out why the
compiler's code is the way it is. (See my latest post to the original
post.) Unfortunately, fixing the problem involves memory corruption
debugging: Perhaps the only thing in coding I can see referring to as
"EVIL" and not consider it overkill! :o)

DSF

unread,
Feb 11, 2015, 3:34:21 AM2/11/15
to
On Mon, 09 Feb 2015 19:26:42 +1300, Ian Collins <ian-...@hotmail.com>
wrote:

>DSF wrote:
>>
>> Explained in my updated post to my post.
>
>Which updated post?

In my reply to your first reply to "Strange structure initialization
problem." I typed:

I replied to this last night, but I don't see it in the group.
Apologies if a second shows up!

Evidentially, it somehow got titled "off". Probably because my F
key repeats sometimes and I do a search of the post for "off" in case
it was supposed to be "of." Somehow, instead of searching, I posted.

David Brown

unread,
Feb 11, 2015, 4:05:58 AM2/11/15
to
On 11/02/15 06:40, DSF wrote:
> On Tue, 10 Feb 2015 09:54:14 +0100, David Brown
> <david...@hesbynett.no> wrote:
>
>> On 09/02/15 22:27, DSF wrote:
>>> On Mon, 09 Feb 2015 09:02:44 +0100, David Brown
>>> <david...@hesbynett.no> wrote:
>>>
>

>> NTFS does not mangle case - it is a case-preserving filesystem. If you
>> save a file as "GetVolumeInfo.c", that's what you get on the disk.
>
> I was not blaming NTFS. I know it's the compiler that's doing the
> case mangling.

Compilers don't mangle the case of filenames for source code, since they
don't write source code files. It's the IDE or editor that can do the
mangling. Keep a clear idea of the difference between an IDE and a
compiler - it will help when you change tools.

>>>
>>> And once again, I must admit I'm using Borland C/C++ 5.01A from the
>>> mid '90s. I don't have time right now to learn the operation of a new
>>> compiler, let alone the probability of having to alter every piece of
>>> source code I've written.
>>
>> Yet you have time to deal with the endless problems you have with such
>> an old tool? If you are waiting until you have a few weeks to spare,
>> you will wait forever - changing to a newer and better development tool
>> is an investment that will save you time in the future.
>>
>> (I am not a fan of upgrading just for the sake of getting the latest and
>> greatest, and I fully understand the benefits of sticking to an old but
>> known tool - but there are limits!)
>
> Yes, I know. I don't consider getting used to a new compiler to be
> as big a problem as the code rewriting and (sigh) subsequent bug
> chasing. When I sit down at the computer to code (or almost anything
> else), it's like I've stepped into a time machine stuck on fast
> forward. I'll do something that felt like it took ten minutes only to
> look at the clock and see two hours have passed. In other words: Few
> weeks? Try several months!

The sooner you invest that time, the sooner you will get a return on
your investment - and the less code you will have to check using a new tool.

>>
>> Fair enough. There is a good chance that you will figure out the
>> problem yourself in this process - the exercise of "finding a minimal
>> compilable sample that demonstrates the problem" is as important to the
>> person with the problem as to those trying to help.
>
> I don't even remember what my last paragraph was referring to.
> (Another problem impeding coding.) I did at least figure out why the
> compiler's code is the way it is. (See my latest post to the original
> post.) Unfortunately, fixing the problem involves memory corruption
> debugging: Perhaps the only thing in coding I can see referring to as
> "EVIL" and not consider it overkill! :o)

I wasn't referring to a specific post in this thread, just a general
rule of asking for help in a newsgroup like this.


DSF

unread,
Feb 11, 2015, 4:54:25 PM2/11/15
to
* It's just called Register Variables. Choices: None, Register
Keyword, and Automatic. It's set to automatic and I know it uses
them. Both from looking at assembler code and the fact that variables
can become inaccessible to the watch debugger (Variable 'c' has been
optimized and is not available.) when they go out of their real scope
in a function. Real scope meaning even though the variable has the
scope of the function from the source code side, it was stored in a
register and is no longer used, freeing the register for other use.

>> And it still creates/uses a local ret and copies
>> it to the LHS pointer on the stack at the end, even though there is
>> only one exit point.
>
>The code in between usually need the registers for something else or
>calls functions that do not guarantee to preserver their values.

With: ret = GetVolumeInfoW(path, &vi), the stack contains:
pointer to vi
pointer to path
pointer to ret
return address.

In the assembly code I write, I just have retval be the first value
on the parameter line:
PROCSTART Div64By64, retval:DWORD, a:int64, b:int64
and access the return value through retval. I don't see the need to
create a local retval at start and then copy it to the stack 'pointer
to ret' before exit. A function could avoid changing the contents of
retval under certain conditions by using a temp. But I don't think
anyone expects a variable left of the = to be left unmodified.
(Unless, of course, there is something in the standard specifying that
a LHS might not change.)


>Furthermore the debugger cannot access variable values that have no
>memory representation. So using a register is not an option as long as
>you have debugging enabled.

Mine seems to. (See above.*) But only for the time the variable
exists.
1 void Foo(void)
2 {
3 int c;
4 ...
5 for(c = 0; c < 10; c++)
6 {
7 ...
8 }
9 ...
}
Variable 'c' is "optimized and is not available" until line 6. In
source line 5 of the executable code EBX is XOR'd with itself (c=0;).
'c', in the form of EBX, is "watchable" until line 9, where EBX is
used for something else.

>> But at least I understand now; it's a
>> boilerplate for initialization.
>>
>>> 2. The memory locations are referenced in entirely different ways.
>>
>> That, as has been mentioned, is in linker territory.
>
>It is up to the implementation to use one or another method.
>Independently of your code.
>
>However, there should also be an option to place compile time constants
>in the code segment. This is not that natural.
>Firstly because on some platforms read and execute access requires
>different permissions. Think of the NX feature, although x86 learned
>this quite lately.
>Secondly in old C character constants like your "This is foo!" are of
>type char* rather than const char*. This has the effect that when is is
>passed to a function as char* argument the function is allowed to change
>the value of this constant. That is the reason why they are normally
>placed in the DATA segment rather than TEXT. This behavior is no longer
>valid, but your compiler might still be compatible to that.

That's why I explicitly used char[] = for the example. This would
require a copy.

>
>> It turns out that FBaseString<wchar_t>::blank + 0x30 resolves to...
>>
>> ...wait for it...
>>
>> Address 0x42a694! Which explains:
>>> 4. I have no idea what/where [0x42a698] refers to.
>> It's the address following FBaseString<wchar_t>::blank + 0x30!
>>
>> So there we have our static storage of two unsigned integer zeros.
>
>Probably. The compiler simply did not create a debugger symbol for its
>internal constant. And the debugger uses the next best symbol with an
>offset. This is nothing where the linker is involved. It is simple the
>first symbol in the data segment of your compilation unit (.obj) and the
>debugger uses the last symbol from the previous compilation unit,
>unaware of compilation units.

Makes sense.

>Activate the assembler output of the compiler and you will see a
>reasonable reference.

In a stand-alone compilation of GetVolumeInfo, there are three
functions that return an IOERRORS structure. All three of them have
their own separate 8 byte data segment allocation with names like
"$eiabhgka" that are not exported or provided in the debugger
information.

>
>> As to whether it's a compiler bug or an error of mine is yet to be
>> determined. But address 0x42a694 lies right in the middle of a buffer
>> of FBaseString and is overwritten near the end of the program loop
>> that this code is within.
>
>Maybe you called free on memory not allocated before and the compiler
>reused your memory.
>
>Normally I would recommend to run your program with a memory analyzer
>like valgrind. But with that old platform you may not have any option
>like this.

I asked here or clc a while back and they confirmed my search
results. Valgrind is Linux only.

It comes with CodeGuard. But CodeGuard depends on replacing the RTL
with its own version. Replacing many RTL functions with those of my
own seems to give CodeGuard problems. It has a penchant for finding
memory overruns that I cannot find in the code.

>
>> I've also learned (at least while I'm using this compiler) to avoid
>> multiple exit points if I'm returning a structure. As a test I added
>> three return ret; statements to GetVolumeInfo. Each one produced:
>>
>> mov eax,dword ptr [ebp+8]
>> mov edx,dword ptr [ebp-8]
>> mov dword ptr [eax],edx
>> mov edx,dword ptr [ebp-4]
>> mov dword ptr [eax+4],edx
>> mov eax,dword ptr [ebp+8]
>> jmp @12
>> Copying the *same* local variable to the stack for return. They may
>> use different registers to do it, but it's an exit point, negating any
>> later requirements on the register values.
>
>- Turn on optimizations. For debugging purposes the compiler can neither
>share nor interleave code between different source lines.
>- Use a recent compiler. Many things happened in between.
>
>
>> The last one even has a convenient label. So instead of repeating
>> the 17-byte sequence each time, they should have dumped every mov
>> above and changed the last to jmp @8!
>
>This is the common sub expression optimization. It has to be turned on
>and the code has to fit into the analysis window.

Enabling common sub expression optimization resulted in the address
referenced by the FBaseString... to be placed in ESI. Other than
that, the code was repeated at each return followed by a jump to the
register POPs.

>
>> Enough of this off topic typing. At least I think I know why some
>> of the strange code is implemented the way it is. And I know where
>> the error is. FBaseString<wchar_t>::blank is a const static value, so
>> it's reasonable that 0x30 farther on (0x42a694) is still in the static
>> area. The memory allocated for the string starts at 0x42a664. I
>> don't have an idea off the top of my head how to determine if it's a
>> bug or corrupt memory data.
>
>I would guess some of your code has undefined behavior and the string
>buffer never should point to that area. I have no idea what FBaseString
>is and whether it allows to assign a buffer from outside. This might
>explain everything.
>Maybe you set some string class instance to blank, retrieved its address
>as C compatible char* and then used this as strcpy target. Bad idea!
>I'm just guessing, of course.
>
>Again, turn on the option to place constants in the TEXT segment and you
>will get a CPU exception when an instruction wants to write to the
>constant. This won't prevent you from from doing other wired things with
>pointers, but it will catch at least a few cases.
>
>As rule of thumb: do not use the type char* anywhere in your C++ code.
>Use your string class or const char* only. This will prevent you from
>many problems.

That's a given. Unlike char[] or char[30], char * is a pointer to a
static string; no copying involved.

>I did not take care of wchar_t in my post. Everything which applies to
>char applies to wchar_t as well.
>
>
>Marcel

Thanks again,

DSF

unread,
Feb 12, 2015, 3:11:42 PM2/12/15
to
On Tue, 10 Feb 2015 10:11:41 +0100, Marcel Mueller
<news.5...@spamgourmet.org> wrote:


{big snip}

>> Enough of this off topic typing. At least I think I know why some
>> of the strange code is implemented the way it is. And I know where
>> the error is. FBaseString<wchar_t>::blank is a const static value, so
>> it's reasonable that 0x30 farther on (0x42a694) is still in the static
>> area. The memory allocated for the string starts at 0x42a664. I
>> don't have an idea off the top of my head how to determine if it's a
>> bug or corrupt memory data.
>
>I would guess some of your code has undefined behavior and the string
>buffer never should point to that area. I have no idea what FBaseString
>is and whether it allows to assign a buffer from outside. This might
>explain everything.
>Maybe you set some string class instance to blank, retrieved its address
>as C compatible char* and then used this as strcpy target. Bad idea!
>I'm just guessing, of course.

THANK YOU!!!!

Two parts of your paragraph were trigger phrases that lead me right
to the problem. "blank" and "strcpy target".

Back in October, Alf P. Steinbach showed me how to avoid having to
create a string array with new with code that expects a pointer to a
string it can use, such as your strcpy above.

Instead of:

FString output;

size_t outputsize;
outputsize = SPrintf(NULL, "%s: %d\n", someotherstring,
somenumber);
char tempstr = new[outputsize+1]

SPrintf(tempstr, "%s: %d\n", someotherstring, somenumber);
output = tempstr;
delete[] tempstr;

By using:

size_t outputsize;
outputsize = SPrintf(NULL, "%s: %d\n", someotherstring,
somenumber);
FString output(outputsize+1); // estab buffer size of output

SPrintf(&output[0], "%s: %d\n", someotherstring, somenumber);

Note SPrintf is not a typo. I wrote my own printf code to handle
normal and wide characters. Thought I'd increase safety by having
SPrintf return the number of characters (not bytes) if the target is
NULL. (I'm assuming 8-bit characters in the above example.)

The Actual Memory Problem:
I have been converting code in various functions that creates a
large C string at the start of the function, adds data, and finally
outputs the string at the end to use FString instead. I had this:

LPTSTR output;
output = new TCHAR[8192];
...
SPrintf(output, (LPCTSTR)hlstring, hlid, hlg.hlnumber, hlsz);
...etc.

When I changed output to a FString object, I did this:

FString output;
...
SPrintf(&output[0], (LPCTSTR)hlstring, hlid, hlg.hlnumber, hlsz);
...etc.

See the error? I forgot to give FString an initial buffer value.
And to cut down on overhead of default initializing FString members of
other classes, "FString output;" sets the string pointer member to
point to a static const unsigned int literally called "blank".

It became:

const FStringSt hlstring(TEXT("Hard Link ID: %s. Hard Links in \
group: %4d. Hard link size: %s.\n"));
FString output;
...
size_t opwidth;
opwidth = SPrintf(NULL, (LPCTSTR)hlstring, hlid, hlg.hlnumber,
hlsz);
output.IncreaseBufferLength(++opwidth);
SPrintf(&output[0], (LPCTSTR)hlstring, hlid, hlg.hlnumber, hlsz);
...etc.


>As rule of thumb: do not use the type char* anywhere in your C++ code.
>Use your string class or const char* only. This will prevent you from
>many problems.
>I did not take care of wchar_t in my post. Everything which applies to
>char applies to wchar_t as well.

I'll do you one better. I have a string template named FStringSt (A
& W) [I used to love their root beer! :o) ] to encapsulate with static
char arrays.

>Marcel

Marcel Mueller

unread,
Feb 21, 2015, 2:35:38 PM2/21/15
to
On 11.02.15 22.54, DSF wrote:
> With: ret = GetVolumeInfoW(path, &vi), the stack contains:
> pointer to vi
> pointer to path
> pointer to ret
> return address.
>
> In the assembly code I write, I just have retval be the first value
> on the parameter line:
> PROCSTART Div64By64, retval:DWORD, a:int64, b:int64
> and access the return value through retval. I don't see the need to
> create a local retval at start and then copy it to the stack 'pointer
> to ret' before exit.

AFAIK the C++ standard required this.


> Enabling common sub expression optimization resulted in the address
> referenced by the FBaseString... to be placed in ESI. Other than
> that, the code was repeated at each return followed by a jump to the
> register POPs.

:-) So the CSE optimizer didn't catch it.

>> As rule of thumb: do not use the type char* anywhere in your C++ code.
>> Use your string class or const char* only. This will prevent you from
>> many problems.
>
> That's a given. Unlike char[] or char[30], char * is a pointer to a
> static string; no copying involved.

And it is error prone when the char* pointer is passed around. That the
reason for my statement.

Since I do no longer use non constant raw pointers in my C++ programs
access violation faults were almost finally gone. Only NULL dereference
remained from time to time.


Marcel
0 new messages