>
> On 28 Mar 2012, at 14:35, Andreas Grosam wrote:
>
>> What's the preferred method to append a sequence of bytes in encoding 'encoding' (a CFStringEncoding) to a CFMutableString object?
>>
>> Well, the set of encodings which I'm interested in are the Unicode encoding schemes (UTF-8, UTF-16, UTF-16LE, UTF-16BE, etc)
>>
>> Unfortunately, there is no CFStringAppendBytes() function where I could specify the encoding.
>>
>>
> I presume that you considered CFStringAppendCString():
>
> void CFStringAppendCString (
> CFMutableStringRef theString,
> const char *cStr,
> CFStringEncoding encoding
> );
>
Yes, but:
Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.
Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.
So, strictly, I do NOT have C strings.
I would rather prefer a function like:
void CFStringAppendBytes (
CFMutableStringRef theString,
const void* bytes,
CFIndex numBytes,
CFStringEncoding encoding
);
Andreas
> Regards
>
> Jonathan Mitchell
> Mugginsoft LLP
>
>
>
> _______________________________________________
>
> Cocoa-dev mailing list (Coco...@lists.apple.com)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/agrosam%40onlinehome.de
>
> This email sent to agr...@onlinehome.de
_______________________________________________
Cocoa-dev mailing list (Coco...@lists.apple.com)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/cocoa-dev-garchive-98506%40googlegroups.com
This email sent to cocoa-dev-ga...@googlegroups.com
>
> On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:
>>>
>> I presume that you considered CFStringAppendCString():
>>
>> void CFStringAppendCString (
>> CFMutableStringRef theString,
>> const char *cStr,
>> CFStringEncoding encoding
>> );
>>
>
>
> Yes, but:
>
> Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.
The fact that the function supports encoding presumably entails support for non char code-unit types.
If not, this is a pretty sterile function.
>
> Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.
A valid point. I don't know how the Unicode NULL is generally used in practice.
>
> So, strictly, I do NOT have C strings.
That was what I suspected.
Regards
Jonathan Mitchell
Mugginsoft LLP
_______________________________________________
Cocoa-dev mailing list (Coco...@lists.apple.com)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
> On 28 Mar 2012, at 15:35, Andreas Grosam wrote:
>
>>
>> On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:
>>>>
>>> I presume that you considered CFStringAppendCString():
>>>
>>> void CFStringAppendCString (
>>> CFMutableStringRef theString,
>>> const char *cStr,
>>> CFStringEncoding encoding
>>> );
>>>
>>
>>
>> Yes, but:
>>
>> Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.
> The fact that the function supports encoding presumably entails support for non char code-unit types.
> If not, this is a pretty sterile function.
>
>>
>> Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.
> A valid point. I don't know how the Unicode NULL is generally used in practice.
Unicode NULL is the least of your problems. In UTF16, each character in the normal ASCII range is going to contain a zero as one of its two bytes (which one, of course, depending on whether the encoding is big- or little-endian). CFStringAppendCString(), along with the other functions that take C strings, stops at the first zero byte it hits, which means that unless your entire file is in a non-Western script, it’s going to get cut short.
CFAppendCString() is not what you want if you might be using UTF16.
Charles
That's right. The first thing CFStringAppendCString() does is call strlen().
CoreFoundation does have a function internally that would do what you want. You could file a bug report asking for a new API to match. However, it requires almost as much work as using a temporary CFString object anyway, except in some ASCII and UTF-16 cases. I would not expect your CFStringCreateWithBytesNoCopy() solution to be much slower unless you're performing a large number of short appends with one of CFString's preferred encodings.
--
Greg Parker gpa...@apple.com Runtime Wrangler
> On Mar 28, 2012, at 9:55 AM, jona...@mugginsoft.com wrote:
>
>> On 28 Mar 2012, at 15:35, Andreas Grosam wrote:
>>
>>>
>>> On Mar 28, 2012, at 3:58 PM, jona...@mugginsoft.com wrote:
>>>>>
>>>> I presume that you considered CFStringAppendCString():
>>>>
>>>> void CFStringAppendCString (
>>>> CFMutableStringRef theString,
>>>> const char *cStr,
>>>> CFStringEncoding encoding
>>>> );
>>>>
>>>
>>>
>>> Yes, but:
>>>
>>> Can I use any code-unit, even so the function indicates that only "char" is allowed? What, if I have uint32_t or uint16_t code-unit types? (I guess, I can use them anyway after a type cast.) The description is not clear about this, though.
>> The fact that the function supports encoding presumably entails support for non char code-unit types.
>> If not, this is a pretty sterile function.
>>
>>>
>>> Also, consider that Unicode NULL (U+0000) is a regular character in Unicode - which conflicts with C strings which shall be zero terminated.
>> A valid point. I don't know how the Unicode NULL is generally used in practice.
>
> Unicode NULL is the least of your problems. In UTF16, each character in the normal ASCII range is going to contain a zero as one of its two bytes (which one, of course, depending on whether the encoding is big- or little-endian). CFStringAppendCString(), along with the other functions that take C strings, stops at the first zero byte it hits, which means that unless your entire file is in a non-Western script, it’s going to get cut short.
>
> CFAppendCString() is not what you want if you might be using UTF16.
If you have an UTF-16 buffer, I think you can just use CFStringAppendCharacters().
-- Jean-Daniel
My use case is to create CFStrings from Unicode - which in the majority of cases are "short" strings - say, less than 100 characters. In this case, I create an immutable CFString directly in one go.
Less frequently in the typical use case of the application, I have to deal with larger strings, say base64 encoded images, or something like this. Yet, I need to create a CFString. Since the content will be received over the net, I naturally get content in chunks anyway (namely NSData objects received from a connection). So, in order to avoid a large temporary buffer which holds the complete string, I use a smaller buffer, e.g. 4 KByte and then append this small buffer to the resulting CFString, until it is complete.
As Greg suggested, I'll try my solution first and test whether it will work efficiently:
CFStringRef tmp = CFStringCreateWithBytesNoCopy(kCFAllocatorDefault, bytes, numBytes, encoding, NO, kCFAllocatorNull);
CFStringAppend(myMutableString, tmp);
CFRelease(tmp);
Nonetheless, if there is a chance that a possible implementation
void CFStringAppendBytes (
CFMutableStringRef theString,
const void* bytes,
CFIndex numBytes,
CFStringEncoding encoding
);
will be faster, or uses less memory, I'll also file an enhancement request.
Thanks All!
Regards
Andreas
> Thank you all for your answers and suggestions.
>
> My use case is to create CFStrings from Unicode - which in the majority of cases are "short" strings - say, less than 100 characters. In this case, I create an immutable CFString directly in one go.
>
> Less frequently in the typical use case of the application, I have to deal with larger strings, say base64 encoded images, or something like this. Yet, I need to create a CFString. Since the content will be received over the net, I naturally get content in chunks anyway (namely NSData objects received from a connection). So, in order to avoid a large temporary buffer which holds the complete string, I use a smaller buffer, e.g. 4 KByte and then append this small buffer to the resulting CFString, until it is complete.
>
If this is to store base64, I would rather use NSData instead of NSString, and even better, I would use a staged base64 decoder that let me decode the data when they arrive, and store the result in NSMutableData
> https://lists.apple.com/mailman/options/cocoa-dev/devlists%40shadowlab.org
>
> This email sent to devl...@shadowlab.org
-- Jean-Daniel