str2num vs str2double

Filoche

unread,

Nov 11, 2010, 10:16:35 AM11/11/10

to

Hi everyone.

I have to convert an array of around 2 millions stings (numbers) to number. I found that str2num is very slow compared to str2double. However, when I pass the array to the str2double function it return me inf. I found the function was summing the data in the char vector.

Anyone got an idea how to make str2double return me a vector with all number instead of just 1 number?

With regards,
Phil

dpb

unread,

Nov 11, 2010, 10:24:51 AM11/11/10

to

...

Doesn't sound right (the summing, that is...) What's a sample of the data?

If give str2double a cell array of valid numeric representations, it'll
return a vector.

But, I'd suggest sscanf might be better for the purpose and perhaps
faster to boot (although I've never done any testing, just hypothesis
that root calls to basic io functions are quicker than the higher level
ones)...

--

Filoche

unread,

Nov 11, 2010, 12:42:10 PM11/11/10

to

Hi there and thank you for your answer.

In fact I a matrix (M) of char (say 1000000 X 8)

'12234243'
'12234243'
'12234243'
..
'12234243'

I was using

mynum = str2double(M(:,1:8));

I'll try to give a look to sscanf

dpb

unread,

Nov 11, 2010, 1:15:55 PM11/11/10

to

Filoche wrote:
...

> In fact I a matrix (M) of char (say 1000000 X 8)
>
> '12234243'
> '12234243'
> '12234243'
> ..
> '12234243'
>
> I was using
>
> mynum = str2double(M(:,1:8));
>
> I'll try to give a look to sscanf

You've a problem in that the intrinsic storage in ML is column-major and
by storing the character array as column of strings the order of
characters scanned is down as opposed to across.

Consider the following--

>> s=['12234243';'12234243';'12234243'] % your storage pattern
s =
12234243
12234243
12234243
>> s(:)' % Note the internal order...
ans =
111222222333444222444333

And, the result you had earlier of a large number isn't the sum of the
values, it's the result of converting the above (or similar) string to a
value based on scanning the character array in internal memory order.

Try the following instead or better yet, start out w/ cell strings
instead of character

>> str2double(cellstr(s))
ans =
12234243
12234243
12234243
>>

BTW, unless has been promoted since my release, sscanf() isn't
implemented for class cell so the str2xxx() route is your choice.

--

dpb

unread,

Nov 11, 2010, 1:33:33 PM11/11/10

to

dpb wrote:
> Filoche wrote:
> ...
>
>> In fact I a matrix (M) of char (say 1000000 X 8)
>>
>> '12234243'
>> '12234243'
>> '12234243'
>> ..
>> '12234243'
>>
>> I was using
>>
>> mynum = str2double(M(:,1:8));
>>
>> I'll try to give a look to sscanf
>
> You've a problem in that the intrinsic storage in ML is column-major and
> by storing the character array as column of strings the order of
> characters scanned is down as opposed to across.

...

And, forgot about str2num where you started...

It's the internal routine that takes care of the internal order problem
and the reason it's slower than str2double is that it is processing on a
row-major memory access order to translate each row string. Given the
very large array size, this probably causes cache misses.

Not sure, the conversion to cell followed by str2double probably will
have similar performance issues.

sscanf() and friends would need the data stored as a single long string
with a delimiter between fields to work correctly of a looping solution.

--

Matt Fig

unread,

Nov 11, 2010, 1:43:03 PM11/11/10

to

dpb <no...@non.net> wrote in message <ibhc5j$t2k$1...@news.eternal-september.org>...

> Consider the following--
>
> >> s=['12234243';'12234243';'12234243'] % your storage pattern
> s =
> 12234243
> 12234243
> 12234243
> >> s(:)' % Note the internal order...
> ans =
> 111222222333444222444333
>
> And, the result you had earlier of a large number isn't the sum of the
> values, it's the result of converting the above (or similar) string to a
> value based on scanning the character array in internal memory order.
>
> Try the following instead or better yet, start out w/ cell strings
> instead of character
>
> >> str2double(cellstr(s))
> ans =
> 12234243
> 12234243
> 12234243
> >>
>
> BTW, unless has been promoted since my release, sscanf() isn't
> implemented for class cell so the str2xxx() route is your choice.
>

Also, sscanf, as previously mentioned, will work here.

s=['12234243';'12234243';'12234243']
t = sscanf(s,'%8d')
t(2)

dpb

unread,

Nov 11, 2010, 1:55:08 PM11/11/10

to

Matt Fig wrote:
...

> Also, sscanf, as previously mentioned, will work here.
>
> s=['12234243';'12234243';'12234243'] t = sscanf(s,'%8d') t(2)

Indeed, now I wonder what I did wrong to convince myself still had a
storage order problem...I can't reproduce whatever typo I made in a
sanity check and sent myself off on a wild goose chase... :(

--

Filoche

unread,

Nov 11, 2010, 1:58:39 PM11/11/10

to

Thank you sir for your time.

Your help is appreciated.

With regards,
Phil