Grupos de Google ya no admite nuevas publicaciones ni suscripciones de Usenet. El contenido anterior sigue siendo visible.

Extension to BinLists Function

Visto 353 veces
Saltar al primer mensaje no leído

Don

no leída,
5 ene 2012, 5:59:045/1/12
a
Hello,

The documentation shows examples of BinLists putting into
bins one dimensional vectors of numbers such as the following
example:

data = {1,3,2,1,4,5,6,2};
breakPoints = {-Infinity,2,5,7,Infinity};

BinLists[data, {breakPoints}]

which returns:

{{1, 1}, {3, 2, 4, 2}, {5, 6}, {}}

I would like to put into bins entire sublists of data
of arbitray depth such as the following
example where every sublist is 2-dimensional:

data1 = Transpose[{data, Table[Random[],{Length[data]}]}]

which results for the values of data1:

{{1,0.936229},{3,0.128096},{2,0.393583},{1,0.301525},{4,0.503822},{5,0.253597},{6,0.0835316},{2,0.0068356}}

In this simple example, the sublists are binned based on the value of the first element
of every sublist.

The result, using the same breakpoints (this time applied to the first
element of every sublist as in the example above),
should be:


{{{1,0.936229},{1,0.301525}},{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}},{{5,0.253597},{6,0.0835316}},{}}


The binLists function below does this job.
But, it uses brute force in the form of a couple of
nested For functions to accomplish this.
Is there a more efficient way of binning
sublists of arbitrary depth?

Thank you.

Don

==========================================

For the second example above, which uses the
binLists function defined below, the inputs to the binLists
function are:

array = data1
breakPts = {2, 5, 7}
pos = {1}


binLists[data1, breakPts, pos]

returns

{{{1,0.936229},{1,0.301525}},{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}},{{5,0.253597},{6,0.0835316}},{}}

which is the correct result.

===============================

Definition of binLists:

Remove[binLists ];

binLists[array_List, breakPts_List, pos_List:{} ] :=
Module[{},



breakPtIntervalV= Partition[Join[{-Infinity},breakPts,{Infinity}], 2, 1];

nIntervals = Length[breakPtIntervalV];

bins = Table[{},{nIntervals}];

(*
elemV holds the element from each sublist in array that
that binning is to be a function of
*)

If[Length[pos] > 0,
elemV = #[[Apply[Sequence, pos]]]& /@ array,
elemV = array
];(* If Length *)

For[j = 1, j<= Length[array], ++j,

For[k=1, k<=nIntervals, ++k,

If[
elemV[[j]] >= breakPtIntervalV[[k,1]] &&
elemV[[j]] < breakPtIntervalV[[k,2]],
AppendTo[bins[[k]], array[[j]]]
Continue[]
]


];(* For k *)

];(* For j *)

Return[bins]

](* End Module binLists *)

Bob Hanlon

no leída,
6 ene 2012, 4:21:086/1/12
a
breakPoints = {-Infinity, 2, 5, 7, Infinity};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1, 0.301525},
{4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2, 0.0068356}};

As stated in the documentation, BinLists handles multi-dimensional data

First /@ BinLists[data1, {breakPoints}, {{-Infinity, Infinity}}]

{{{1, 0.936229}, {1, 0.301525}}, {{3, 0.128096}, {2, 0.393583}, {4,
0.503822}, {2, 0.0068356}}, {{5, 0.253597}, {6, 0.0835316}}, {}}


Bob Hanlon
> For the second example above, which uses the
> binLists function defined below, the inputs to the binLists
> function are:
>
> array = data1
> breakPts = {2, 5, 7}
> pos = {1}
>
>
> binLists[data1, breakPts, pos]
>
> returns
>
> {{{1,0.936229},{1,0.301525}},{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}},{{5,0.253597},{6,0.0835316}},{}}
>
> which is the correct result.
>
> =========================
>

Heike Gramberg

no leída,
6 ene 2012, 4:22:416/1/12
a
For your second example you could do something like

index[bp_] := Function[{x}, Evaluate@Piecewise@ MapIndexed[{#2[[1]], #1[[1]] <= x < #1[[2]]} &, Partition[bp, 2, 1]]]

bins[lst_, breakPoints_] := With[{fx = index[breakPoints]},
Flatten[#, 1] & /@ Reap[Sow[#, fx[#[[1]]]]; & /@ lst, Range[Length[breakPoints] - 1]][[2]]]

then bins[data1, breakPoints] returns

{{{1, 0.936229}, {1, 0.301525}}, {{3, 0.128096}, {2, 0.393583}, {4, 0.503822}, {2, 0.0068356}}, {{5, 0.253597}, {6, 0.0835316}}, {}}

Here, index[] is just a helper function such that index[breakPoints][x] returns the index of the bin x belongs.

Heike.

On 5 Jan 2012, at 11:57, Don wrote:

> Hello,
>
> The documentation shows examples of BinLists putting into
> bins one dimensional vectors of numbers such as the following
> example:
>
> data = {1,3,2,1,4,5,6,2};
> breakPoints = {-Infinity,2,5,7,Infinity};
>
> BinLists[data, {breakPoints}]
>
> which returns:
>
> {{1, 1}, {3, 2, 4, 2}, {5, 6}, {}}
>
> I would like to put into bins entire sublists of data
> of arbitray depth such as the following
> example where every sublist is 2-dimensional:
>
> data1 = Transpose[{data, Table[Random[],{Length[data]}]}]
>
> which results for the values of data1:
>
> =
{{1,0.936229},{3,0.128096},{2,0.393583},{1,0.301525},{4,0.503822},{5,0.253597},{6,0.0835316},{2,0.0068356}}
>
> In this simple example, the sublists are binned based on the value of the first element
> of every sublist.
>
> The result, using the same breakpoints (this time applied to the first
> element of every sublist as in the example above),
> should be:
>
>
> =

Don

no leída,
7 ene 2012, 5:21:117/1/12
a
Thank you Bob for your response to my problem.

I was unable to get a correct answer in exactly the way
you have formulated it.

When I do


breakPoints = {-Infinity, 2, 5, 7, Infinity};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1, 0.301525},
{4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2, 0.0068356}};

BinLists[data1, {breakPoints}, {{-Infinity, Infinity}}]

I get an error message which says:

Interpolation::indat: "Data point {-\[Infinity], 0} contains abscissa -\[Infinity], which is not a real number.

And it suggests I click on a link whch redirects me to:
ref/message/Interpolation/indat for further
explanation.

I got around the Infinity problem in
the error message by replacing the Infinity in both the breakPoints vector
and in {-Infinity, Infinity} with a number that is larger than any number
in data1 but which is still finite:

brkPts = {-100, 2, 5, 7, 100}

and then tried BinLists again:


BinLists[data1,{brkPts},{{-100,100}}]

which did work and produced:

{{{{1,0.936229},{1,0.301525}}},{{{3,0.128096},{2,0.393583},{4,0.503822},{2,0.0068356}}},{{{5,0.253597},{6,0.0835316}}},{{}}}

But, I wanted to extend BinLists to being able to
bin on any position in the data, not just the first element
of a sublist.

For example, if I wanted to bin on the second element
in a sublist in data1, I don't see how to go about doing that
with the above technique.


Using the binLists function in my first post it would look like
the following:

brkPts = Range[.1, 1.0, .1]
binLists[data1,brkPts, {2}]

which results in the following:

{{{6,0.0835316},{2,0.0068356}},{{3,0.128096}},{{5,0.253597}},{{2,0.393583},{1,0.301525}},{},{{4,0.503822}},{},{},{},{{1,0.936229}},{}}


The third parameter, {2}, to binLists allows me to specify
the element in a sublist of data1 which is to be used for binning,
no matter how complicated a sublist is (assuming, of course,
that each sublist has the same structure).

For example, if I wanted to bin
on the second element of the third element
in each sublist of data2 below, the
third input to binLists would be {3,2}:


data2={{1,0.936229, {2,.03}},{3,0.128096, {9,.73}},{2,0.393583, {4,.22}},{8,0.301525, {2,.18}},{1,0.503822, {6,.19}},{5,0.253597, {3,.20}},{6,0.0835316, {3,.29}},{2,0.0068356, {4,.81}}};

binLists[data2, brkPts2, {3,2}]

which results in

{{{1,0.936229,{2,0.03}}},{{8,0.301525,{2,0.18}},{1,0.503822,{6,0.19}}},{{2,0.393583,{4,0.22}},{5,0.253597,{3,0.2}},{6,0.0835316,{3,0.29}}},{},{},{},{},{{3,0.128096,{9,0.73}}},{{2,0.0068356,{4,0.81}}},{},{}}


I don't see any way from the documentation to
get BinLists to do this as it does not take as input
the specification of the element position in the data
upon which binning is to occur, like {3,2} above.

The trouble with binLists, as mentioned in the first post, is that
it is rather clumsy and depends on nested For loops
to do most of the work
which, I assume from past experience, is quite slow
in terms of processor time. I was
wondering if there is a faster, perhaps more elegant
way, to accomplis this.

Thank you.

Don

Bob Hanlon

no leída,
8 ene 2012, 4:30:598/1/12
a
With my version I do not have a problem with using infinities as the boundaries

$Version

"8.0 for Mac OS X x86 (64-bit) (October 5, 2011)"

breakPoints = {-Infinity, 2, 5, 7, Infinity};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1, 0.301525},
{4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2, 0.0068356}};

res1 = First /@ BinLists[data1, {breakPoints}, {{-Infinity, Infinity}}]

{{{1, 0.936229}, {1, 0.301525}}, {{3, 0.128096}, {2, 0.393583}, {4,
0.503822}, {2, 0.0068356}}, {{5, 0.253597}, {6, 0.0835316}}, {}}

For your second example, note that your bins do not cover all of your
data and those items with second element below 0.1 or greater than 1
should not appear.

brkPts = Range[.1, 1.0, .1];

res2 = BinLists[data1, {{-Infinity, Infinity}}, {brkPts}] // First

{{{3, 0.128096}}, {{5, 0.253597}}, {{2, 0.393583}, {1, 0.301525}}, {}, {{4,
0.503822}}, {}, {}, {}, {{1, 0.936229}}}

To obtain the result that you stated, I redefine your brkPts

brkPts2 = Flatten[{-Infinity, Range[.1, 1.0, .1], Infinity}];

res3 = BinLists[data1, {{-Infinity, Infinity}}, {brkPts2}] // First

{{{6, 0.0835316}, {2, 0.0068356}}, {{3, 0.128096}}, {{5, 0.253597}}, {{2,
0.393583}, {1, 0.301525}}, {}, {{4, 0.503822}}, {}, {}, {}, {{1,
0.936229}}, {}}

For your third example, brkPts2 is undefined. I will use brkPts2 from
my last example. For the general case, I would use Cases and Table

data2 = {{1, 0.936229, {2, .03}}, {3, 0.128096, {9, .73}}, {2,
0.393583, {4, .22}},
{8, 0.301525, {2, .18}}, {1, 0.503822, {6, .19}}, {5, 0.253597, {3, .20}},
{6, 0.0835316, {3, .29}}, {2, 0.0068356, {4, .81}}};

binLists[array_List, breakPts_List, pos_List: {}] :=
If[pos == {},
BinLists[array, {breakPts}],
Table[Cases[
array, _?(breakPts[[k]] <= #[[Sequence @@ pos]] < breakPts[[k + 1]] &],
{k, Length[breakPts] - 1}]]

res1 == binLists[data1, breakPoints, {1}]

True

res2 == binLists[data1, brkPts, {2}]

True

res3 == binLists[data1, brkPts2, {2}]

True

binLists[data2, brkPts2, {3, 2}]

{{{1, 0.936229, {2, 0.03}}}, {{8, 0.301525, {2, 0.18}}, {1,
0.503822, {6, 0.19}}}, {{2, 0.393583, {4, 0.22}}, {5,
0.253597, {3, 0.2}}, {6, 0.0835316, {3, 0.29}}}, {}, {}, {}, {}, {{3,
0.128096, {9, 0.73}}}, {{2, 0.0068356, {4, 0.81}}}, {}, {}}


Bob Hanlon

Vince Virgilio

no leída,
8 ene 2012, 4:32:008/1/12
a

Darren Glosemeyer

no leída,
10 ene 2012, 5:57:3710/1/12
a
Yes, this particular issue was fixed for version 8.0.4, so Don likely
just needs to get that update to get the fix. As a workaround, the
Infinity's can be replaced with values outside the range of the data, e.g.

breakPoints = {-10^6, 2, 5, 7, 10^6};

data1 = {{1, 0.936229}, {3, 0.128096}, {2, 0.393583}, {1,
0.301525}, {4, 0.503822}, {5, 0.253597}, {6, 0.0835316}, {2,
0.0068356}};

res1 = First /@ BinLists[data1, {breakPoints}, {{-10^6, 10^6}}]

should work fine in his version.

Darren Glosemeyer
Wolfram Research
0 mensajes nuevos