W dniu 2011-09-11 00:27, A pisze:
>> SizeOf()
>
Formula is simple, and work in all versions of Delphi.
bytesize := string.Length() * sizeof(char);
Both "string" and "char" are just aliases to real types.
in older Delphi it will translate to
bytesize := AnsiString.Length() * sizeof(AnsiChar);
in new, unicode Delphi it will translate to
bytesize := UnicodeString.Length() * sizeof(UnicodeChar);
Delphi guarantee that "string" type is default string type for VCL.
"char" type is type of character used in "string". To get bytesize of
"string" you need to multiply its length and size of character. You do
not need to know real size of character, it may be one, two or four
bytes. But as long as you use sizeof(char) You code will work correctly.
> Yes, I could multiply by size of widechar, but that would convert
> fail to codepoints that have 4 bytes -
No, it will not fail. UnicodeString.Length() does not count characters.
It count code units (character may be one unit or two units).
Basically Length() return number of allocated bytes, and divided by
length of character, for all types of strings.
Reason for this is simple: length of string is stored in memory few
bytes before string data. This value is returned by Length() call, also
this value is used to reallocate memory. You can retrieve string length
manually, call Pchar() to get pointer to data, move it back some offset
(i do not remember how many), cast pointer to integer, and read pointed
integer.
Note that UTFstring works in exactly same way. Indeed, it is implemented
using same structures and functions (all string types are)! Only one
difference is different value in "encoding" field of string record.
Characters composed from two code units are indeed problem, but not in
this case.
> Nevermind, I solved it by assigning UnicodeString to UTF8String which does
> the conversion and then doing checksum of UTF8String buffer. It seems that
> UTF8String.Length reports number of bytes of the buffer (which may be larger
> than number of characters) rather than number of Unicode characters - and
> that was exactly what I needed so UTF8 is just fine for this purpose.
Yes, because character counting in UTF-8 is complicated, while Delphi
guarantee that Length() of string type executes in constant time. But
Your method may be slow, because of unicode to UTF8 conversion.
--
Arivald