Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Extension to BinLists Function

353 views
Skip to first unread message

Don

unread,
Jan 5, 2012, 5:59:04 AM1/5/12
to
Hello,

The documentation shows examples of BinLists putting into
bins one dimensional vectors of numbers such as the following
example:

data = {1,3,2,1,4,5,6,2};
breakPoints = {-Infinity,2,5,7,Infinity};

BinLists[data, {breakPoints}]

which returns:

{{1, 1}, {3, 2, 4, 2}, {5, 6}, {}}

I would like to put into bins entire sublists of data
of arbitray depth such as the following
example where every sublist is 2-dimensional:

data1 = Transpose[{data, Table[Random[],{Length[data]}]}]

which results for the values of data1:

{{1,0.936229},{3,0.128096},{2,0.393583},{1,0.301525},{4,0.503822},{5,0.253597},{6,0.0835316},{2,0.0068356}}

In this simple example, the sublists are binned based on the value of the first element
of every sublist.

The result, using the same breakpoints (this time applied to the first
element of every sublist as in the example above),
should be:


{{{1,0.936229},{1,0.301525}},{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}},{{5,0.253597},{6,0.0835316}},{}}


The binLists function below does this job.
But, it uses brute force in the form of a couple of
nested For functions to accomplish this.
Is there a more efficient way of binning
sublists of arbitrary depth?

Thank you.

Don

==========================================

For the second example above, which uses the
binLists function defined below, the inputs to the binLists
function are:

array = data1
breakPts = {2, 5, 7}
pos = {1}


binLists[data1, breakPts, pos]

returns

{{{1,0.936229},{1,0.301525}},{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}},{{5,0.253597},{6,0.0835316}},{}}

which is the correct result.

===============================

Definition of binLists:

Remove[binLists ];

binLists[array_List, breakPts_List, pos_List:{} ] :=
Module[{},



breakPtIntervalV= Partition[Join[{-Infinity},breakPts,{Infinity}], 2, 1];

nIntervals = Length[breakPtIntervalV];

bins = Table[{},{nIntervals}];

(*
elemV holds the element from each sublist in array that
that binning is to be a function of
*)

If[Length[pos] > 0,
elemV = #[[Apply[Sequence, pos]]]& /@ array,
elemV = array
];(* If Length *)

For[j = 1, j<= Length[array], ++j,

For[k=1, k<=nIntervals, ++k,

If[
elemV[[j]] >= breakPtIntervalV[[k,1]] &&
elemV[[j]] < breakPtIntervalV[[k,2]],
AppendTo[bins[[k]], array[[j]]]
Continue[]
]


];(* For k *)

];(* For j *)

Return[bins]

](* End Module binLists *)

Bob Hanlon

unread,
Jan 6, 2012, 4:21:08 AM1/6/12
to
breakPoints = {-Infinity, 2, 5, 7, Infinity};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1, 0.301525},
{4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2, 0.0068356}};

As stated in the documentation, BinLists handles multi-dimensional data

First /@ BinLists[data1, {breakPoints}, {{-Infinity, Infinity}}]

{{{1, 0.936229}, {1, 0.301525}}, {{3, 0.128096}, {2, 0.393583}, {4,
0.503822}, {2, 0.0068356}}, {{5, 0.253597}, {6, 0.0835316}}, {}}


Bob Hanlon
> For the second example above, which uses the
> binLists function defined below, the inputs to the binLists
> function are:
>
> array = data1
> breakPts = {2, 5, 7}
> pos = {1}
>
>
> binLists[data1, breakPts, pos]
>
> returns
>
> {{{1,0.936229},{1,0.301525}},{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}},{{5,0.253597},{6,0.0835316}},{}}
>
> which is the correct result.
>
> =========================
>

Heike Gramberg

unread,
Jan 6, 2012, 4:22:41 AM1/6/12
to
For your second example you could do something like

index[bp_] := Function[{x}, Evaluate@Piecewise@ MapIndexed[{#2[[1]], #1[[1]] <= x < #1[[2]]} &, Partition[bp, 2, 1]]]

bins[lst_, breakPoints_] := With[{fx = index[breakPoints]},
Flatten[#, 1] & /@ Reap[Sow[#, fx[#[[1]]]]; & /@ lst, Range[Length[breakPoints] - 1]][[2]]]

then bins[data1, breakPoints] returns

{{{1, 0.936229}, {1, 0.301525}}, {{3, 0.128096}, {2, 0.393583}, {4, 0.503822}, {2, 0.0068356}}, {{5, 0.253597}, {6, 0.0835316}}, {}}

Here, index[] is just a helper function such that index[breakPoints][x] returns the index of the bin x belongs.

Heike.

On 5 Jan 2012, at 11:57, Don wrote:

> Hello,
>
> The documentation shows examples of BinLists putting into
> bins one dimensional vectors of numbers such as the following
> example:
>
> data = {1,3,2,1,4,5,6,2};
> breakPoints = {-Infinity,2,5,7,Infinity};
>
> BinLists[data, {breakPoints}]
>
> which returns:
>
> {{1, 1}, {3, 2, 4, 2}, {5, 6}, {}}
>
> I would like to put into bins entire sublists of data
> of arbitray depth such as the following
> example where every sublist is 2-dimensional:
>
> data1 = Transpose[{data, Table[Random[],{Length[data]}]}]
>
> which results for the values of data1:
>
> =
{{1,0.936229},{3,0.128096},{2,0.393583},{1,0.301525},{4,0.503822},{5,0.253597},{6,0.0835316},{2,0.0068356}}
>
> In this simple example, the sublists are binned based on the value of the first element
> of every sublist.
>
> The result, using the same breakpoints (this time applied to the first
> element of every sublist as in the example above),
> should be:
>
>
> =

Don

unread,
Jan 7, 2012, 5:21:11 AM1/7/12
to
Thank you Bob for your response to my problem.

I was unable to get a correct answer in exactly the way
you have formulated it.

When I do


breakPoints = {-Infinity, 2, 5, 7, Infinity};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1, 0.301525},
{4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2, 0.0068356}};

BinLists[data1, {breakPoints}, {{-Infinity, Infinity}}]

I get an error message which says:

Interpolation::indat: "Data point {-\[Infinity], 0} contains abscissa -\[Infinity], which is not a real number.

And it suggests I click on a link whch redirects me to:
ref/message/Interpolation/indat for further
explanation.

I got around the Infinity problem in
the error message by replacing the Infinity in both the breakPoints vector
and in {-Infinity, Infinity} with a number that is larger than any number
in data1 but which is still finite:

brkPts = {-100, 2, 5, 7, 100}

and then tried BinLists again:


BinLists[data1,{brkPts},{{-100,100}}]

which did work and produced:

{{{{1,0.936229},{1,0.301525}}},{{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}}},{{{5,0.253597},{6,0.0835316}}},{{}}}

But, I wanted to extend BinLists to being able to
bin on any position in the data, not just the first element
of a sublist.

For example, if I wanted to bin on the second element
in a sublist in data1, I don't see how to go about doing that
with the above technique.


Using the binLists function in my first post it would look like
the following:

brkPts = Range[.1, 1.0, .1]
binLists[data1,brkPts, {2}]

which results in the following:

{{{6,0.0835316},{2,0.0068356}},{{3,0.128096}},{{5,0.253597}},{{2,0.393583},{1,0.301525}},{},{{4,0.503822}},{},{},{},{{1,0.936229}},{}}


The third parameter, {2}, to binLists allows me to specify
the element in a sublist of data1 which is to be used for binning,
no matter how complicated a sublist is (assuming, of course,
that each sublist has the same structure).

For example, if I wanted to bin
on the second element of the third element
in each sublist of data2 below, the
third input to binLists would be {3,2}:


data2={{1,0.936229, {2,.03}},{3,0.128096, {9,.73}},{2,0.393583, {4,.22}},{8,0.301525, {2,.18}},{1,0.503822, {6,.19}},{5,0.253597, {3,.20}},{6,0.0835316, {3,.29}},{2,0.0068356, {4,.81}}};

binLists[data2, brkPts2, {3,2}]

which results in

{{{1,0.936229,{2,0.03}}},{{8,0.301525,{2,0.18}},{1,0.503822,{6,0.19}}},{{2,0.393583,{4,0.22}},{5,0.253597,{3,0.2}},{6,0.0835316,{3,0.29}}},{},{},{},{},{{3,0.128096,{9,0.73}}},{{2,0.0068356,{4,0.81}}},{},{}}


I don't see any way from the documentation to
get BinLists to do this as it does not take as input
the specification of the element position in the data
upon which binning is to occur, like {3,2} above.

The trouble with binLists, as mentioned in the first post, is that
it is rather clumsy and depends on nested For loops
to do most of the work
which, I assume from past experience, is quite slow
in terms of processor time. I was
wondering if there is a faster, perhaps more elegant
way, to accomplis this.

Thank you.

Don

Bob Hanlon

unread,
Jan 8, 2012, 4:30:59 AM1/8/12
to
With my version I do not have a problem with using infinities as the boundaries

$Version

"8.0 for Mac OS X x86 (64-bit) (October 5, 2011)"

breakPoints = {-Infinity, 2, 5, 7, Infinity};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1, 0.301525},
{4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2, 0.0068356}};

res1 = First /@ BinLists[data1, {breakPoints}, {{-Infinity, Infinity}}]

{{{1, 0.936229}, {1, 0.301525}}, {{3, 0.128096}, {2, 0.393583}, {4,
0.503822}, {2, 0.0068356}}, {{5, 0.253597}, {6, 0.0835316}}, {}}

For your second example, note that your bins do not cover all of your
data and those items with second element below 0.1 or greater than 1
should not appear.

brkPts = Range[.1, 1.0, .1];

res2 = BinLists[data1, {{-Infinity, Infinity}}, {brkPts}] // First

{{{3, 0.128096}}, {{5, 0.253597}}, {{2, 0.393583}, {1, 0.301525}}, {}, {{4,
0.503822}}, {}, {}, {}, {{1, 0.936229}}}

To obtain the result that you stated, I redefine your brkPts

brkPts2 = Flatten[{-Infinity, Range[.1, 1.0, .1], Infinity}];

res3 = BinLists[data1, {{-Infinity, Infinity}}, {brkPts2}] // First

{{{6, 0.0835316}, {2, 0.0068356}}, {{3, 0.128096}}, {{5, 0.253597}}, {{2,
0.393583}, {1, 0.301525}}, {}, {{4, 0.503822}}, {}, {}, {}, {{1,
0.936229}}, {}}

For your third example, brkPts2 is undefined. I will use brkPts2 from
my last example. For the general case, I would use Cases and Table

data2 = {{1, 0.936229, {2, .03}}, {3, 0.128096, {9, .73}}, {2,
0.393583, {4, .22}},
{8, 0.301525, {2, .18}}, {1, 0.503822, {6, .19}}, {5, 0.253597, {3, .20}},
{6, 0.0835316, {3, .29}}, {2, 0.0068356, {4, .81}}};

binLists[array_List, breakPts_List, pos_List: {}] :=
If[pos == {},
BinLists[array, {breakPts}],
Table[Cases[
array, _?(breakPts[[k]] <= #[[Sequence @@ pos]] < breakPts[[k + 1]] &],
{k, Length[breakPts] - 1}]]

res1 == binLists[data1, breakPoints, {1}]

True

res2 == binLists[data1, brkPts, {2}]

True

res3 == binLists[data1, brkPts2, {2}]

True

binLists[data2, brkPts2, {3, 2}]

{{{1, 0.936229, {2, 0.03}}}, {{8, 0.301525, {2, 0.18}}, {1,
0.503822, {6, 0.19}}}, {{2, 0.393583, {4, 0.22}}, {5,
0.253597, {3, 0.2}}, {6, 0.0835316, {3, 0.29}}}, {}, {}, {}, {}, {{3,
0.128096, {9, 0.73}}}, {{2, 0.0068356, {4, 0.81}}}, {}, {}}


Bob Hanlon

Vince Virgilio

unread,
Jan 8, 2012, 4:32:00 AM1/8/12
to

Darren Glosemeyer

unread,
Jan 10, 2012, 5:57:37 AM1/10/12
to
Yes, this particular issue was fixed for version 8.0.4, so Don likely
just needs to get that update to get the fix. As a workaround, the
Infinity's can be replaced with values outside the range of the data, e.g.

breakPoints = {-10^6, 2, 5, 7, 10^6};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1,
0.301525}, {4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2,
0.0068356}};

res1 = First /@ BinLists[data1, {breakPoints}, {{-10^6, 10^6}}]

should work fine in his version.

Darren Glosemeyer
Wolfram Research
0 new messages