I have to convert an array of around 2 millions stings (numbers) to number. I found that str2num is very slow compared to str2double. However, when I pass the array to the str2double function it return me inf. I found the function was summing the data in the char vector.
Anyone got an idea how to make str2double return me a vector with all number instead of just 1 number?
With regards,
Phil
Doesn't sound right (the summing, that is...) What's a sample of the data?
If give str2double a cell array of valid numeric representations, it'll
return a vector.
But, I'd suggest sscanf might be better for the purpose and perhaps
faster to boot (although I've never done any testing, just hypothesis
that root calls to basic io functions are quicker than the higher level
ones)...
--
Hi there and thank you for your answer.
In fact I a matrix (M) of char (say 1000000 X 8)
'12234243'
'12234243'
'12234243'
..
'12234243'
I was using
mynum = str2double(M(:,1:8));
I'll try to give a look to sscanf
> In fact I a matrix (M) of char (say 1000000 X 8)
>
> '12234243'
> '12234243'
> '12234243'
> ..
> '12234243'
>
> I was using
>
> mynum = str2double(M(:,1:8));
>
> I'll try to give a look to sscanf
You've a problem in that the intrinsic storage in ML is column-major and
by storing the character array as column of strings the order of
characters scanned is down as opposed to across.
Consider the following--
>> s=['12234243';'12234243';'12234243'] % your storage pattern
s =
12234243
12234243
12234243
>> s(:)' % Note the internal order...
ans =
111222222333444222444333
And, the result you had earlier of a large number isn't the sum of the
values, it's the result of converting the above (or similar) string to a
value based on scanning the character array in internal memory order.
Try the following instead or better yet, start out w/ cell strings
instead of character
>> str2double(cellstr(s))
ans =
12234243
12234243
12234243
>>
BTW, unless has been promoted since my release, sscanf() isn't
implemented for class cell so the str2xxx() route is your choice.
--
And, forgot about str2num where you started...
It's the internal routine that takes care of the internal order problem
and the reason it's slower than str2double is that it is processing on a
row-major memory access order to translate each row string. Given the
very large array size, this probably causes cache misses.
Not sure, the conversion to cell followed by str2double probably will
have similar performance issues.
sscanf() and friends would need the data stored as a single long string
with a delimiter between fields to work correctly of a looping solution.
--
Also, sscanf, as previously mentioned, will work here.
s=['12234243';'12234243';'12234243']
t = sscanf(s,'%8d')
t(2)
> Also, sscanf, as previously mentioned, will work here.
>
> s=['12234243';'12234243';'12234243'] t = sscanf(s,'%8d') t(2)
Indeed, now I wonder what I did wrong to convince myself still had a
storage order problem...I can't reproduce whatever typo I made in a
sanity check and sent myself off on a wild goose chase... :(
--
Your help is appreciated.
With regards,
Phil