12 views

Skip to first unread message

Dec 9, 2005, 6:39:57 AM12/9/05

to

Hello out there,

I have a set of variables each of which has a certain weight or

probability assigned. Say

a -> 50

b -> 30

c -> 10

d -> 10

I have a set of variables each of which has a certain weight or

probability assigned. Say

a -> 50

b -> 30

c -> 10

d -> 10

I want an algorithm which (in the above example) selects 'a' in every

other run and 'c' in 10% of the runs (approximately of course).

My attempt below fails miserably. Using the weights above, 'a' scores

way too high, mostly at the expense of 'c' and 'd', which are far away

from their expected 10% shares.

Evidently my logic is flawed: Foreach variable I get the current

probability by multiplying a random value with the associated weight,

and then select the one with the highest result.

Any advice on the 101 of probability applicable here will be greatly

appreciated.

Best regards

Helmut Giese

---

#

# $lst is a list of 'name / weight' pairs, like <a 20 b 10 c 5>

#

proc selectProb {lst} {

# turn list into array

array set weight $lst

# get the sum of the weights

set sum 0

foreach n [array names weight] {

incr sum $weight($n)

}

# make a pair 'weighted random value / name' for each

foreach n [array names weight] {

set prob($n) [list [expr {(rand() * $weight($n)) / $sum}] $n]

}

# sort by probability

foreach n [array names prob] {

lappend pLst $prob($n)

}

set pLst [lsort -decreasing $pLst]

# but return the names

foreach p $pLst {

lappend res [lindex $p 1]

}

return $res

}

# test

foreach n {a b c d} {

set cnt($n) 0

}

for {set i 0} {$i < 100} {incr i} {

set res [selectProb {a 50 b 30 c 10 d 10}]

# count the "winner"

incr cnt([lindex $res 0])

}

parray cnt

---

Dec 9, 2005, 7:50:22 AM12/9/05

to

Helmut Giese wrote:

> Hello out there,

> I have a set of variables each of which has a certain weight or

> probability assigned. Say

> a -> 50

> b -> 30

> c -> 10

> d -> 10

>

> I want an algorithm which (in the above example) selects 'a' in every

> other run and 'c' in 10% of the runs (approximately of course).

>

> My attempt below fails miserably. Using the weights above, 'a' scores

> way too high, mostly at the expense of 'c' and 'd', which are far away

> from their expected 10% shares.

> Evidently my logic is flawed: Foreach variable I get the current

> probability by multiplying a random value with the associated weight,

> and then select the one with the highest result.

>

> Any advice on the 101 of probability applicable here will be greatly

> appreciated.

> Best regards

> Helmut Giese

> Hello out there,

> I have a set of variables each of which has a certain weight or

> probability assigned. Say

> a -> 50

> b -> 30

> c -> 10

> d -> 10

>

> I want an algorithm which (in the above example) selects 'a' in every

> other run and 'c' in 10% of the runs (approximately of course).

>

> My attempt below fails miserably. Using the weights above, 'a' scores

> way too high, mostly at the expense of 'c' and 'd', which are far away

> from their expected 10% shares.

> Evidently my logic is flawed: Foreach variable I get the current

> probability by multiplying a random value with the associated weight,

> and then select the one with the highest result.

>

> Any advice on the 101 of probability applicable here will be greatly

> appreciated.

> Best regards

> Helmut Giese

Basically it works like this:

You divide your probability space [1-100] (if you use percentage) into

intervals based on the weights you have. So for your example you map the

letters to this intervals:

a -> [1 50]

b -> [51 80]

c -> [81 90]

d -> [91 100]

Now you generate a random number between 1 and 100 and lookup in which

interval it is, thats the winner.

If you only allow integer weights, you can get lazy like this:

proc create_wtable {var weights} {

upvar 1 $var table

set table [list {}]

foreach {key weight} $weights {

for {set i 0} {$i < $weight} {incr i} {

lappend table $key

}

}

# return the value for probability 1.0

return [expr {[llength $table]-1}]

}

proc pick_random {var max} {

upvar 1 $var table

lindex $table [math::random 1 $max]

}

# test

set max [create_wtable wtab {a 50 b 30 c 10 d 10} ]

puts [pick_random wtab $max]

Michael

Dec 9, 2005, 8:31:56 AM12/9/05

to

Helmut Giese wrote:

> Hello out there,

> I have a set of variables each of which has a certain weight or

> probability assigned. Say

> a -> 50

> b -> 30

> c -> 10

> d -> 10

Hi Helmut,> Hello out there,

> I have a set of variables each of which has a certain weight or

> probability assigned. Say

> a -> 50

> b -> 30

> c -> 10

> d -> 10

This any better?:

#!/usr/bin/tclsh

set probabilities {a 50 b 30 c 10 d 10 }

set curr 0

foreach {tok prob} $probabilities {

lappend ::stat(toks) $tok

set ::stat($tok,min) $curr

incr curr $prob

set ::stat($tok,max) $curr

set ::res($tok) 0

}

set ::stat(summ) $curr

# rand() gives an even distribution from 0.0 to 0.999999999

proc randtok {} {

set rand [ expr { rand() * $::stat(summ) }]

foreach tok $::stat(toks) {

if { $rand >= $::stat($tok,min) } {

if { $rand < $::stat($tok,max) } {

return $tok

}

}

}

return this_is_an_error

}

for {set i 0} {$i < 10000 } {incr i} {

incr ::res([randtok])

}

parray ::res

#end

G!

uwe

Dec 9, 2005, 2:31:11 PM12/9/05

to

Helmut Giese wrote:

>I have a set of variables each of which has a certain weight or

>probability assigned. Say

>a -> 50

>b -> 30

>c -> 10

>d -> 10

>

>I want an algorithm which (in the above example) selects 'a' in every

>other run and 'c' in 10% of the runs (approximately of course).

>

>My attempt below fails miserably. Using the weights above, 'a' scores

>way too high, mostly at the expense of 'c' and 'd', which are far away

>from their expected 10% shares.

>Evidently my logic is flawed: Foreach variable I get the current

>probability by multiplying a random value with the associated weight,

>and then select the one with the highest result.

>

>Any advice on the 101 of probability applicable here will be greatly

>appreciated.

First, compute the cumulative sum (or "running total")

of the variables and their weights, and construct a new

list that pairs each variable with the

set csum 0

set clst [list]

foreach {k w} $lst {

set csum [expr {$csum + $w}]

lappend clst $k $csum

}

Next, generate a random number between 0 and the total weight:

set rn [expr {rand() * $sum}]

Then select the first element in the transformed list with a

cumulative sum greater than that number:

foreach {k c} $clst {

if {$c > $rn} { break }

}

# ==> $k is the answer.

With the sample data above, $clst is:

a 50 b 80 c 90 d 100

So a random number between 0 and $csum == 100 will

be <50 half the time, selecting 'a'; will lie between

50 and 80 30% of the time, yielding 'b'; will lie between

80 and 90 10% of the time, et cetera.

Tweak the algorithm above as needed depending on the required

performance characteristics. (If you need to do a lot of

selections from a fixed data set, a binary search through

the cumulative sum list will be profitable.)

--Joe English

Dec 10, 2005, 5:01:19 AM12/10/05

to

Many thanks to all of you.

Sorry for the delay - we have a family reunion and priorities have to

shift sometimes.

On first sight it looks like each of the solutions will solve my

problem but I'll have to wait till after the weekend to actually do

anything with them.

Thanks again and best regards.

Yes, Mom, I'm coming.

Helmut - hosting the whole family for the weekend - Giese

Sorry for the delay - we have a family reunion and priorities have to

shift sometimes.

On first sight it looks like each of the solutions will solve my

problem but I'll have to wait till after the weekend to actually do

anything with them.

Thanks again and best regards.

Yes, Mom, I'm coming.

Helmut - hosting the whole family for the weekend - Giese

Dec 12, 2005, 8:31:11 AM12/12/05

to

Works like a charm - and the logic behind is so easy and self

explaining. Wonder why I took such a twisted approach on Friday.

Thanks again

Helmut Giese

explaining. Wonder why I took such a twisted approach on Friday.

Thanks again

Helmut Giese

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu