Dear all, please see my questions as described below and let me know if you have any solution.
For example, my array [arr] = 5 2 3 1 8 3
1 2 3 There's another array [bcg], which is the occurrence times of each unique element in array [arr]
[bcg] = 1 2 3 2 1 4
2 3 5 How can I get the total occurrence frequencies of sorted elements in [arr] as indicated in [bcg], like:(2+2) (2+3) (3+4+5) (0) (1) (0) (0) (1) which is equal to 4 5 12 0 1 0 0 1, meaning 4 times of 1, 5 times of 2, 12 times of 3, 0 times of 4, 1 times of 5, 0 of 6, 0 of 7 and 1 of 8.
> you didn't ask for a solution without a loop. So here is my simple
> answer:
> arr=[5,2,3,1,8,3,1,2,3]
> bcg=[1,2,3,2,1,4,2,3,5]
> sum=intarr(max(arr)+1)
> for i=0,n_elements(bcg)-1 do sum[arr[i]]+=bcg[i]
> print,sum[1:*]
> Cheers, Heinz
And of course, if you need a very efficient implementation of this (i.e. if your arrays have millions of elements), then read the "chunk indexing" section of JD's HISTOGRAM tutorial http://www.idlcoyote.com/tips/histogram_tutorial.html (you HAVE read JD's HISTOGRAM tutorial, right???)
On Mon, 22 Oct 2012 14:10:18 -0400, Jeremy Bailin wrote:
>On 10/22/12 7:55 AM, Heinz Stege wrote:
>> Hi Danxia,
>> you didn't ask for a solution without a loop. So here is my simple
>> answer:
>> arr=[5,2,3,1,8,3,1,2,3]
>> bcg=[1,2,3,2,1,4,2,3,5]
>> sum=intarr(max(arr)+1)
>> for i=0,n_elements(bcg)-1 do sum[arr[i]]+=bcg[i]
>> print,sum[1:*]
>> Cheers, Heinz
>And of course, if you need a very efficient implementation of this (i.e. >if your arrays have millions of elements), then read the "chunk >indexing" section of JD's HISTOGRAM tutorial >http://www.idlcoyote.com/tips/histogram_tutorial.html (you HAVE read >JD's HISTOGRAM tutorial, right???)
The histogram methods in general are very smart. The above code is
significantly faster than my, which contains the loop. However, from
my point of view, this is not a good solution.
In case of very many elements within arr (and bcg) and/or big numbers
within bcg the reverse indices array ri gets very large. The size of
ri is always greater than total(bcg). IDL may run out of memory.
So I would say, the loop may compete with the reverse indices.
When I wrote "simple answer", I had in mind that there must be another
solution. One without a loop. It is more the "IDL-style". But it is a
little bit more complex:
ii=sort(arr)
sarr=arr[ii]
tot=total(bcg[ii],/cumulative,/integer)
;
ii=where(sarr ne shift(sarr,-1),count)
if count eq 0 then ii=[n_elements(sarr)-1]
tot=tot[ii]
if count ge 2 then tot[1:*]-=tot
;
sum=lonarr(sarr[n_elements(sarr)-1]+1)
sum[sarr[ii]]=tot
;
print,sum[1:*]
This code has a moderate memory consumption and seems to be a true
alternative to both, the loop-method and the reverse-indices-method.
A word to the developers of IDL: What about a WEIGHT keyword in the
histogram function?
print,histogram(arr,weight=bcg,/integer,min=1)
This would be nice. By the way, when I type the line above, IDL
(Version 8.0.1) says:
% Keyword INTEGER not allowed in call to: HISTOGRAM
% Error occurred at: $MAIN$
% Execution halted at: $MAIN$
No integer keyword allowed in the histogram function? Strange! ;-)
> On Mon, 22 Oct 2012 14:10:18 -0400, Jeremy Bailin wrote:
>> On 10/22/12 7:55 AM, Heinz Stege wrote:
>>> Hi Danxia,
>>> you didn't ask for a solution without a loop. So here is my simple
>>> answer:
>>> arr=[5,2,3,1,8,3,1,2,3]
>>> bcg=[1,2,3,2,1,4,2,3,5]
>>> sum=intarr(max(arr)+1)
>>> for i=0,n_elements(bcg)-1 do sum[arr[i]]+=bcg[i]
>>> print,sum[1:*]
>>> Cheers, Heinz
>> And of course, if you need a very efficient implementation of this (i.e.
>> if your arrays have millions of elements), then read the "chunk
>> indexing" section of JD's HISTOGRAM tutorial
>> http://www.idlcoyote.com/tips/histogram_tutorial.html (you HAVE read
>> JD's HISTOGRAM tutorial, right???)
>> -Jeremy.
> Hi Jeremy,
> I suppose you mean something like the following:
> The histogram methods in general are very smart. The above code is
> significantly faster than my, which contains the loop. However, from
> my point of view, this is not a good solution.
> In case of very many elements within arr (and bcg) and/or big numbers
> within bcg the reverse indices array ri gets very large. The size of
> ri is always greater than total(bcg). IDL may run out of memory.
> So I would say, the loop may compete with the reverse indices.
> When I wrote "simple answer", I had in mind that there must be another
> solution. One without a loop. It is more the "IDL-style". But it is a
> little bit more complex:
> ii=sort(arr)
> sarr=arr[ii]
> tot=total(bcg[ii],/cumulative,/integer)
> ;
> ii=where(sarr ne shift(sarr,-1),count)
> if count eq 0 then ii=[n_elements(sarr)-1]
> tot=tot[ii]
> if count ge 2 then tot[1:*]-=tot
> ;
> sum=lonarr(sarr[n_elements(sarr)-1]+1)
> sum[sarr[ii]]=tot
> ;
> print,sum[1:*]
> This code has a moderate memory consumption and seems to be a true
> alternative to both, the loop-method and the reverse-indices-method.
> A word to the developers of IDL: What about a WEIGHT keyword in the
> histogram function?
> print,histogram(arr,weight=bcg,/integer,min=1)
> This would be nice. By the way, when I type the line above, IDL
> (Version 8.0.1) says:
> % Keyword INTEGER not allowed in call to: HISTOGRAM
> % Error occurred at: $MAIN$
> % Execution halted at: $MAIN$
> No integer keyword allowed in the histogram function? Strange! ;-)
This is really great. I have learned something new again. Thank you,
Jeremy.
For the documentation: Jeremy's way of "chunk indexing" goes the
following way:
h=histogram(arr,min=0,reverse_indices=ri)
sum=lonarr(size(h,/dimensions))
for i=0l,n_elements(h)-1 do $
if h[i] gt 0 then sum[i]=total(bcg[ri[ri[i]:ri[i+1]-1]],/integer)
print,sum[1:*]
Small code, very fast, and low memory consumption. This is perfect.