> The above for loop would make unicode_string look like this:
> "e null a null s null y null"
> Is there a faster way to do this... in place maybe?
Depends what you want.
It /seems/ that you're assuming a little-endian architecture, and that the intent is to treat unicode_string as UTF-16 encoded (via some low level cast), and that you're assuming that the original character encoding is Latin-1 or a subset.
That's an awful lot of assumptions.
Look in the standard library for mbcstowcs or something like that, in the C library, or 'widen'-functions in the C++ library.
Under what seems to be your assumption of Latin-1 encoding of the 'char' string, and an additional assumption of 16-bit 'wchar_t', you can however do
<code> #include <iostream> #include <string> #include <limits.h> using namespace std;
#define STATIC_ASSERT( x ) typedef char shouldBeTrue[(x)? 1 : -1]
> SG <s.gesem...@gmail.com>, on 05/09/2010 10:17:37, wrote: > > On 5 Sep., 19:05, "Francesco S. Carta" wrote: > >> in place, yes: use the > >> std::string::insert() method.
> > Or better yet, resize() to final size, assign the non-null characters > > in a backwards loop and set a couple of chars to zero:
> On 5 sep, 19:27, "Francesco S. Carta"<entul...@gmail.com> wrote: >> SG<s.gesem...@gmail.com>, on 05/09/2010 10:17:37, wrote: >>> On 5 Sep., 19:05, "Francesco S. Carta" wrote: >>>> in place, yes: use the >>>> std::string::insert() method.
>>> Or better yet, resize() to final size, assign the non-null characters >>> in a backwards loop and set a couple of chars to zero:
> Marc <marc.gli...@gmail.com>, on 05/09/2010 10:54:46, wrote:
>> On 5 sep, 19:27, "Francesco S. Carta"<entul...@gmail.com> wrote: >>> SG<s.gesem...@gmail.com>, on 05/09/2010 10:17:37, wrote: >>>> On 5 Sep., 19:05, "Francesco S. Carta" wrote: >>>>> in place, yes: use the >>>>> std::string::insert() method.
>>>> Or better yet, resize() to final size, assign the non-null characters >>>> in a backwards loop and set a couple of chars to zero:
>> Faster. SG's code has linear complexity and yours is quadratic. >> Readability is something else...
> Exactly. So neither is better than the other unless we associate > "better" to "more readable" or to "faster" ;-)
Just for the records, a better solution, in my opinion, is to build an appropriately sized new string and copying the original chars at the appropriate positions - a compromise between readability and speed, somewhat:
void foo(string& s) { string r(s.size()*2, '\0'); for(int i = 0, e = s.size(); i < e; ++i) { r[i*2] = s[i]; } s.swap(r); }
ASSUMING that the OP really wants exactly this - WRT Alf P. Steinbach's notes in the other post.
> Marc wrote: > > On 5 sep, 19:27, Francesco S. Carta wrote: > >> Define "better". > > Faster. [...] > Exactly. So neither is better than the other unless we associate > "better" to "more readable" or to "faster" ;-)
SG <s.gesem...@gmail.com>, on 05/09/2010 11:53:58, wrote:
> On 5 Sep., 20:04, Francesco S. Carta wrote: >> Marc wrote: >>> On 5 sep, 19:27, Francesco S. Carta wrote: >>>> Define "better". >>> Faster. [...] >> Exactly. So neither is better than the other unless we associate >> "better" to "more readable" or to "faster" ;-)
> See the original post:
> "...Is there a faster way to do this..."
Of course, I was just playing at nitpicking after your overzealous snip - see my further post ;-)
> On 2010-09-06 10:56, Francesco S. Carta wrote: >> tni <t...@example.invalid>, on 06/09/2010 10:37:08, wrote:
>>> On 2010-09-05 20:04, Francesco S. Carta wrote:
>>>>> Faster. SG's code has linear complexity and yours is quadratic. >>>>> Readability is something else...
>>>> Exactly. So neither is better than the other unless we associate >>>> "better" to "more readable" or to "faster" ;-)
>>> Unnecessary quadratic code is a bug (unless you have guarantees on the >>> input size).
>> That was a deliberately slow implementation - see all the other posts.
> My point isn't that the implementation is a bit slower, it's wrong and > should never be used. There is no question whether one of the two is > better.
> Feed your quadratic implementation a 10MB string and it will literally > run for hours.
You're right, of course, and finally somebody posted the correct, explicit objection to the first response of mine, which was over-zealously half-snipped by SG:
"Faster, I don't know (measure it), in place, yes: use the std::string::insert() method."
My purpose was to push the OP to make all the tests and the reasonings.
But the OP disappeared and the group took circa ten posts to come down to this, I won't post any bait like this anymore, just to save my time :-)
Goran Pusic <gor...@cse-semaphore.com>, on 06/09/2010 03:15:21, wrote:
> Guys, aren't you a bit misleading with iterators and big-O and > stuff? ;-)
My bad. I intentionally posted a wrong suggestion without clearly marking it as such - I thought I was going to be castigated immediately, but since the punishment didn't come at once, I kept it on to see what was going to happen... now I realize that it wasn't all that fun for the others, so I present my apologies to the group for the wasted time.
> Goran Pusic <gor...@cse-semaphore.com>, on 06/09/2010 03:15:21, wrote:
>> Guys, aren't you a bit misleading with iterators and big-O and >> stuff? ;-)
> My bad. I intentionally posted a wrong suggestion without clearly > marking it as such - I thought I was going to be castigated immediately, > but since the punishment didn't come at once, I kept it on to see what > was going to happen... now I realize that it wasn't all that fun for the > others, so I present my apologies to the group for the wasted time.
The sad thing is, I've seen quadratic stuff like that far too often in production code. There certainly are more than enough (clueless) people who wouldn't consider it a joke.
tni <t...@example.invalid>, on 06/09/2010 14:33:13, wrote:
> On 2010-09-06 12:38, Francesco S. Carta wrote: >> Goran Pusic <gor...@cse-semaphore.com>, on 06/09/2010 03:15:21, wrote:
>>> Guys, aren't you a bit misleading with iterators and big-O and >>> stuff? ;-)
>> My bad. I intentionally posted a wrong suggestion without clearly >> marking it as such - I thought I was going to be castigated immediately, >> but since the punishment didn't come at once, I kept it on to see what >> was going to happen... now I realize that it wasn't all that fun for the >> others, so I present my apologies to the group for the wasted time.
> The sad thing is, I've seen quadratic stuff like that far too often in > production code. There certainly are more than enough (clueless) people > who wouldn't consider it a joke.
Indeed. And since it seems that most of those programmers has no idea of what the O() notation is, simply saying that an algorithm has linear or quadratic complexity is not enough to make the point clear.
Unfortunately it needs more words (and practical examples as the one you posted in the other branch) and a big fat "quadratic is BAD" sign three meters on each side, but it's worth the effort.
> tni <t...@example.invalid>, on 06/09/2010 11:38:54, wrote:
> > On 2010-09-06 10:56, Francesco S. Carta wrote: > >> tni <t...@example.invalid>, on 06/09/2010 10:37:08, wrote:
> >>> On 2010-09-05 20:04, Francesco S. Carta wrote:
> >>>>> Faster. SG's code has linear complexity and yours is quadratic. > >>>>> Readability is something else...
> >>>> Exactly. So neither is better than the other unless we associate > >>>> "better" to "more readable" or to "faster" ;-)
> >>> Unnecessary quadratic code is a bug (unless you have guarantees on the > >>> input size).
> >> That was a deliberately slow implementation - see all the other posts.
> > My point isn't that the implementation is a bit slower, it's wrong and > > should never be used. There is no question whether one of the two is > > better.
> > Feed your quadratic implementation a 10MB string and it will literally > > run for hours.
> You're right, of course, and finally somebody posted the correct, > explicit objection to the first response of mine, which was > over-zealously half-snipped by SG:
> "Faster, I don't know (measure it), in place, yes: use the > std::string::insert() method."
> My purpose was to push the OP to make all the tests and the reasonings.
> But the OP disappeared and the group took circa ten posts to come down > to this, I won't post any bait like this anymore, just to save my time :-)
Busy weekend. Thanks for the replies (all of you).
Why do such a thing?
Over simplified, but here is why: A Microsoft NT hash is a string (encoded as I described (utf-16le)) with MD4 then applied. In short, that produces an NT hash. So if I wanted to create an NT hash for the word "easy" and I had a std::string the process would look something like this:
My initial post confused the issue by using the term "unicode" in a variable name (although this is a form of unicode (utf-16le) encoding). And no, by default wstring doesn't magically do this.
--------------------------------------
insert works fine for in place. Faster than my for loop? Does not seem so. I'm not sure it needs to be any faster. I was just wondering. Maybe my approach is slow. Obviously, the encoding is an additional step to perform (plain MD4 on std::string is faster than encode std::string then MD4... but not a whole lot). BTW, these strings would never be 10 megs ;)
> On Sep 6, 6:05 am, "Francesco S. Carta"<entul...@gmail.com> wrote: >> tni<t...@example.invalid>, on 06/09/2010 11:38:54, wrote:
>>> On 2010-09-06 10:56, Francesco S. Carta wrote: >>>> tni<t...@example.invalid>, on 06/09/2010 10:37:08, wrote:
>>>>> On 2010-09-05 20:04, Francesco S. Carta wrote:
>>>>>>> Faster. SG's code has linear complexity and yours is quadratic. >>>>>>> Readability is something else...
>>>>>> Exactly. So neither is better than the other unless we associate >>>>>> "better" to "more readable" or to "faster" ;-)
>>>>> Unnecessary quadratic code is a bug (unless you have guarantees on the >>>>> input size).
>>>> That was a deliberately slow implementation - see all the other posts.
>>> My point isn't that the implementation is a bit slower, it's wrong and >>> should never be used. There is no question whether one of the two is >>> better.
>>> Feed your quadratic implementation a 10MB string and it will literally >>> run for hours.
>> You're right, of course, and finally somebody posted the correct, >> explicit objection to the first response of mine, which was >> over-zealously half-snipped by SG:
>> "Faster, I don't know (measure it), in place, yes: use the >> std::string::insert() method."
>> My purpose was to push the OP to make all the tests and the reasonings.
>> But the OP disappeared and the group took circa ten posts to come down >> to this, I won't post any bait like this anymore, just to save my time :-)
> Busy weekend. Thanks for the replies (all of you).
> Why do such a thing?
> Over simplified, but here is why: A Microsoft NT hash is a string > (encoded as I described (utf-16le)) with MD4 then applied. In short, > that produces an NT hash. So if I wanted to create an NT hash for the > word "easy" and I had a std::string the process would look something > like this:
> My initial post confused the issue by using the term "unicode" in a > variable name (although this is a form of unicode (utf-16le) > encoding). And no, by default wstring doesn't magically do this.
> --------------------------------------
> insert works fine for in place. Faster than my for loop? Does not seem > so. I'm not sure it needs to be any faster. I was just wondering. > Maybe my approach is slow. Obviously, the encoding is an additional > step to perform (plain MD4 on std::string is faster than encode > std::string then MD4... but not a whole lot). BTW, these strings would > never be 10 megs ;)
> That's about it. Thanks again.
You're welcome.
Your initial implementation is not slow, it's pretty fine, my only (serious) suggestion about it is to use something like the second implementation I presented, which takes advantage of std::string::swap() to avoid an unneeded additional copy from the working string to the original one and optimizes the allocation by creating an appropriately-sized string beforehand.
Completely forget the insert() method for containers such as std::vector and std::string - because it leads to very bad performances - but remember it for containers like std::list - where it is good.
> Busy weekend. Thanks for the replies (all of you).
> Why do such a thing?
> Over simplified, but here is why: A Microsoft NT hash is a string > (encoded as I described (utf-16le)) with MD4 then applied. In short, > that produces an NT hash. So if I wanted to create an NT hash for the > word "easy" and I had a std::string the process would look something > like this:
> My initial post confused the issue by using the term "unicode" in a > variable name (although this is a form of unicode (utf-16le) > encoding). And no, by default wstring doesn't magically do this.
OK, with more information more can be said.
First, in Windows, conversion to wstring like
string s = "blah blah"; wstring u( s.begin(), s.end() )
produces exactly the byte sequence that you're laboriously creating.
You're right that no magic is involved. However, no magic is necessary.
That said, second, Windows ANSI Western is a superset, not a subset, of Latin 1, and so this zero-insertion procedure does not in general produce UTF-16.
E.g., if s contains a '€' Euro sign you won't get proper UTF-16. If you need proper UTF-16 you should use one of the conversion functions available in the standard library, or alternatively in the Windows API.
Third, I've already given this advice and it gets tiresome repeating it.