Re: Best Way to Append Bytes With Encoding To CFMutableString

20 views
Skip to first unread message

Andreas Grosam

unread,
Mar 28, 2012, 10:35:43 AM3/28/12
to Cocoa-Dev List

On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:

>
> On 28 Mar 2012, at 14:35, Andreas Grosam wrote:
>
>> What's the preferred method to append a sequence of bytes in encoding 'encoding' (a CFStringEncoding) to a CFMutableString object?
>>
>> Well, the set of encodings which I'm interested in are the Unicode encoding schemes (UTF-8, UTF-16, UTF-16LE, UTF-16BE, etc)
>>
>> Unfortunately, there is no CFStringAppendBytes() function where I could specify the encoding.
>>
>>
> I presume that you considered CFStringAppendCString():
>
> void CFStringAppendCString (
> CFMutableStringRef theString,
> const char *cStr,
> CFStringEncoding encoding
> );
>


Yes, but:

Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.

Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.

So, strictly, I do NOT have C strings.


I would rather prefer a function like:

void CFStringAppendBytes (
CFMutableStringRef theString,
const void* bytes,
CFIndex numBytes,
CFStringEncoding encoding
);


Andreas


> Regards
>
> Jonathan Mitchell
> Mugginsoft LLP
>
>
>
> _______________________________________________
>
> Cocoa-dev mailing list (Coco...@lists.apple.com)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/agrosam%40onlinehome.de
>
> This email sent to agr...@onlinehome.de


_______________________________________________

Cocoa-dev mailing list (Coco...@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/cocoa-dev-garchive-98506%40googlegroups.com

This email sent to cocoa-dev-ga...@googlegroups.com

jona...@mugginsoft.com

unread,
Mar 28, 2012, 10:55:55 AM3/28/12
to coco...@lists.apple.com

On 28 Mar 2012, at 15:35, Andreas Grosam wrote:

>
> On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:
>>>
>> I presume that you considered CFStringAppendCString():
>>
>> void CFStringAppendCString (
>> CFMutableStringRef theString,
>> const char *cStr,
>> CFStringEncoding encoding
>> );
>>
>
>
> Yes, but:
>
> Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.

The fact that the function supports encoding presumably entails support for non char code-unit types.
If not, this is a pretty sterile function.

>
> Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.

A valid point. I don't know how the Unicode NULL is generally used in practice.

>
> So, strictly, I do NOT have C strings.

That was what I suspected.

Regards

Jonathan Mitchell
Mugginsoft LLP


_______________________________________________

Cocoa-dev mailing list (Coco...@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

Charles Srstka

unread,
Mar 28, 2012, 2:00:51 PM3/28/12
to jona...@mugginsoft.com, coco...@lists.apple.com
On Mar 28, 2012, at 9:55 AM, jona...@mugginsoft.com wrote:

> On 28 Mar 2012, at 15:35, Andreas Grosam wrote:
>
>>
>> On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:
>>>>
>>> I presume that you considered CFStringAppendCString():
>>>
>>> void CFStringAppendCString (
>>> CFMutableStringRef theString,
>>> const char *cStr,
>>> CFStringEncoding encoding
>>> );
>>>
>>
>>
>> Yes, but:
>>
>> Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.
> The fact that the function supports encoding presumably entails support for non char code-unit types.
> If not, this is a pretty sterile function.
>
>>
>> Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.
> A valid point. I don't know how the Unicode NULL is generally used in practice.

Unicode NULL is the least of your problems. In UTF16, each character in the normal ASCII range is going to contain a zero as one of its two bytes (which one, of course, depending on whether the encoding is big- or little-endian). CFStringAppendCString(), along with the other functions that take C strings, stops at the first zero byte it hits, which means that unless your entire file is in a non-Western script, it’s going to get cut short.

CFAppendCString() is not what you want if you might be using UTF16.

Charles

Greg Parker

unread,
Mar 28, 2012, 3:24:36 PM3/28/12
to Charles Srstka, coco...@lists.apple.com
On Mar 28, 2012, at 11:00 AM, Charles Srstka <coco...@charlessoft.com> wrote:
> Unicode NULL is the least of your problems. In UTF16, each character in the normal ASCII range is going to contain a zero as one of its two bytes (which one, of course, depending on whether the encoding is big- or little-endian). CFStringAppendCString(), along with the other functions that take C strings, stops at the first zero byte it hits, which means that unless your entire file is in a non-Western script, it’s going to get cut short.
>
> CFStringAppendCString() is not what you want if you might be using UTF16.

That's right. The first thing CFStringAppendCString() does is call strlen().

CoreFoundation does have a function internally that would do what you want. You could file a bug report asking for a new API to match. However, it requires almost as much work as using a temporary CFString object anyway, except in some ASCII and UTF-16 cases. I would not expect your CFStringCreateWithBytesNoCopy() solution to be much slower unless you're performing a large number of short appends with one of CFString's preferred encodings.


--
Greg Parker gpa...@apple.com Runtime Wrangler

Jean-Daniel Dupas

unread,
Mar 28, 2012, 3:52:11 PM3/28/12
to Charles Srstka, Cocoa-Dev List

Le 28 mars 2012 à 20:00, Charles Srstka a écrit :

> On Mar 28, 2012, at 9:55 AM, jona...@mugginsoft.com wrote:
>
>> On 28 Mar 2012, at 15:35, Andreas Grosam wrote:
>>
>>>
>>> On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:
>>>>>
>>>> I presume that you considered CFStringAppendCString():
>>>>
>>>> void CFStringAppendCString (
>>>> CFMutableStringRef theString,
>>>> const char *cStr,
>>>> CFStringEncoding encoding
>>>> );
>>>>
>>>
>>>
>>> Yes, but:
>>>
>>> Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.
>> The fact that the function supports encoding presumably entails support for non char code-unit types.
>> If not, this is a pretty sterile function.
>>
>>>
>>> Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.
>> A valid point. I don't know how the Unicode NULL is generally used in practice.
>
> Unicode NULL is the least of your problems. In UTF16, each character in the normal ASCII range is going to contain a zero as one of its two bytes (which one, of course, depending on whether the encoding is big- or little-endian). CFStringAppendCString(), along with the other functions that take C strings, stops at the first zero byte it hits, which means that unless your entire file is in a non-Western script, it’s going to get cut short.
>
> CFAppendCString() is not what you want if you might be using UTF16.


If you have an UTF-16 buffer, I think you can just use CFStringAppendCharacters().

-- Jean-Daniel

Andreas Grosam

unread,
Mar 31, 2012, 12:20:56 PM3/31/12
to Cocoa-Dev List
Thank you all for your answers and suggestions.

My use case is to create CFStrings from Unicode - which in the majority of cases are "short" strings - say, less than 100 characters. In this case, I create an immutable CFString directly in one go.

Less frequently in the typical use case of the application, I have to deal with larger strings, say base64 encoded images, or something like this. Yet, I need to create a CFString. Since the content will be received over the net, I naturally get content in chunks anyway (namely NSData objects received from a connection). So, in order to avoid a large temporary buffer which holds the complete string, I use a smaller buffer, e.g. 4 KByte and then append this small buffer to the resulting CFString, until it is complete.

As Greg suggested, I'll try my solution first and test whether it will work efficiently:

CFStringRef tmp = CFStringCreateWithBytesNoCopy(kCFAllocatorDefault, bytes, numBytes, encoding, NO, kCFAllocatorNull);
CFStringAppend(myMutableString, tmp);
CFRelease(tmp);


Nonetheless, if there is a chance that a possible implementation

void CFStringAppendBytes (
CFMutableStringRef theString,
const void* bytes,
CFIndex numBytes,
CFStringEncoding encoding
);


will be faster, or uses less memory, I'll also file an enhancement request.

Thanks All!

Regards
Andreas

Jean-Daniel Dupas

unread,
Mar 31, 2012, 1:12:39 PM3/31/12
to Andreas Grosam, Cocoa-Dev List

Le 31 mars 2012 à 18:20, Andreas Grosam a écrit :

> Thank you all for your answers and suggestions.
>
> My use case is to create CFStrings from Unicode - which in the majority of cases are "short" strings - say, less than 100 characters. In this case, I create an immutable CFString directly in one go.
>
> Less frequently in the typical use case of the application, I have to deal with larger strings, say base64 encoded images, or something like this. Yet, I need to create a CFString. Since the content will be received over the net, I naturally get content in chunks anyway (namely NSData objects received from a connection). So, in order to avoid a large temporary buffer which holds the complete string, I use a smaller buffer, e.g. 4 KByte and then append this small buffer to the resulting CFString, until it is complete.
>

If this is to store base64, I would rather use NSData instead of NSString, and even better, I would use a staged base64 decoder that let me decode the data when they arrive, and store the result in NSMutableData

> https://lists.apple.com/mailman/options/cocoa-dev/devlists%40shadowlab.org
>
> This email sent to devl...@shadowlab.org

-- Jean-Daniel

Reply all
Reply to author
Forward
0 new messages