optim

Yoni Sidi

unread,

Nov 27, 2015, 1:36:20 PM11/27/15

to Israel R User Group

is there an optim in r that can solve this?

i have 2 numeric vectors, one long (x) and one short (x.agg). The short one is an aggregate of cells from the long vector, just i dont know which ones. the overall sum of each vector is the same.

example:

x=c(587,589,548,551,586,727,749,703,694,548,624,644,642,419,587,633,745,609,619,762,645,686,715,714,708,480,667,678,696,729,656,627,441,638,651,687,571,468,420,395)

x.agg=c(4855,3532,1423,989,3775,2755,4082,3427)

> sum(x)
[1] 24838

amit gal

unread,

Nov 27, 2015, 1:47:18 PM11/27/15

to israel-r-...@googlegroups.com

You can model it as an integer programming optimization. There is a solver for that in R (rsymphony i think)
Suppose x has the values and g are the groups, define binary variables b_ij whether x_i belongs to group j. The constraints are straightforward, and you try to maximize the sum.

--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

amit gal

unread,

Nov 27, 2015, 6:23:44 PM11/27/15

to israel-r-...@googlegroups.com

here is a code that solves the problem. however, it is not practical to use it. integer programming is NP-complete, and even relatively small problems become intractable. probably there are better solutions here, although not asymptotically, but for practical purposes.

for example the above problem took too long to solve. I checked it on x of length 10 divided into three groups, and it worked fine.

for what it's worth here is the code:

require("lpSolve")
findpar = function(x,y) {

nx = length(x)
ng = length(y)
vr = ng*nx
const1 = t(sapply(1:nx,function(i) {a = rep(0,vr); a[(i-1)*ng+1:ng] = 1; a}))
steps = 0:(nx-1)*ng
const2 = t(sapply(1:ng,function(i) {a = rep(0,vr); a[steps+i] = x; a}))

dir1=rep("<=",nx)
dir2=rep("<=",ng)
dir=c(dir1,dir2)
const = rbind(const1,const2)
rhs1 = rep(1,nx)
rhs2 = y
rhs=c(rhs1,rhs2)
trg = rep(1,vr)

print("start")
a = lp(direction = "max",
         const.mat = const,
         const.dir = dir,
         const.rhs = rhs,
         objective.in = trg,
         all.bin=T)
print("end")
b = a$solution
g = sapply(1:nx, function(i) which(b[((i-1)*ng+1):(i*ng)]==1))
return(g)
}

Yoni Sidi

unread,

Nov 28, 2015, 12:07:11 AM11/28/15

to Israel R User Group

cool. thanks!

Yoni Sidi

unread,

Nov 28, 2015, 1:43:21 PM11/28/15

to Israel R User Group

i asked a huji-cs friend. he said this is indeed an np complete problem. but said there is an algo called subset sum (FPTAS/knapsack). here it is in R (its fast also)

library(adagio)
x=c(587,589,548,551,586,727,749,703,694,548,624,644,642,419,587,633,745,609,619,762,645,686,715,714,708,480,667,678,696,729,656,627,441,638,651,687,571,468,420,395)
 
x.agg=c(4855,3532,1423,989,3775,2755,4082,3427)
x2=subsetsum(x,x.agg[1])

> x2
$val
[1] 4855


$inds
[1]  1  3  4  5  7 10 12 13

> sum(x[x2$inds])
[1] 4855

On Saturday, November 28, 2015 at 7:07:11 AM UTC+2, Yoni Sidi wrote:

cool. thanks!

Yoni Sidi

unread,

Nov 28, 2015, 2:05:24 PM11/28/15

to Israel R User Group

this became an interesting problem:

x2=mdply(1:1000,.fun = function(ind){
s=sample(x = x,size = length(x),replace = F)
s.ind=subsetsum(s,x.agg[1])[[2]]
data.frame(i=ind,x=s[s.ind])
},.progress = "text")


hist(x2$x)

amit gal

unread,

Nov 29, 2015, 3:54:23 AM11/29/15

to israel-r-...@googlegroups.com

note that the indices:

i=c(2, 3, 5, 7, 8, 13, 14, 19)

give you the same partial sum of 4855

and so are about 117,000 other combinations!!!

the problem is to find not a signle subset, but a complete partition.

--

amit gal

unread,

Nov 29, 2015, 4:52:01 AM11/29/15

to israel-r-...@googlegroups.com

note that subsetsum will find one solution to one sum. often there is much more than one solution, and the real problem is to find a partition that consider all the sums you have in x.agg

the code I sent you (privately, will be happy to share here if anyone is interested), finds all possible solutions to all sums. it takes more time, but then, it allows you to make better inference (e.g. if there is a sum with only one solution then it must be it, and we can safely remove solutions to other sums that uses the specific participation. or if there is an element of x that participates in all solutions for a specific sum, then we can eliminate any solution that uses this element in other sums, etc.) or for employing backtracking mechanism of trial and error.

On Sat, Nov 28, 2015 at 8:43 PM, Yoni Sidi <yon...@gmail.com> wrote:

--

Yoni Sidi

unread,

Nov 29, 2015, 8:05:14 AM11/29/15

to Israel R User Group

you are right. I'll try ur code again and pray my computer behaves better this time ;)

Yotam Hechtlinger

unread,

Nov 29, 2015, 9:14:22 AM11/29/15

to israel-r-...@googlegroups.com

Hi Amit,

In your code - are you still solving that as integer programing, or there is a more efficient way?

Yotam.

amit gal

unread,

Nov 29, 2015, 9:28:46 AM11/29/15

to israel-r-...@googlegroups.com

below is the code. the function genpos(x,y) generates all the possibilities of subsets of x to sum up to values in y. it uses a dynamic programming approach, and runs in O(length(x)*max(y)) time. it returns a list of lists of vectors.

ex = function(l,i) {
lapply(l,function(v) c(v,i))
}

genpos = function(x,y) {

nx = length(x)
ny = max(y)
result = list()

#init
result[[as.character(x[1])]] = list(x[1])
#loop
for (i in 2:nx) {
    print(i)
    tmp = list()
    for (j in 1:ny) {
      l1 = list()
      l2 = list()
      l3 = result[[as.character(j)]]

      if (j==x[i]) l1 = list(x[i])
      l2 = lex(result[[as.character(j-x[i])]],x[i])
      l = c(l1,l2)
      if(length(l)>0) {
        tmp[[as.character(j)]] = l
      }
    }
    #merge tmp and result
    for (ln in names(tmp)) {
      result[[ln]] = c(result[[ln]],tmp[[ln]])
    }
}
return(result[as.character(y)])
}

amit gal

unread,

Nov 29, 2015, 9:40:38 AM11/29/15

to israel-r-...@googlegroups.com

should note that the integer programming solves the overall problem (if it works, practically). the dynamic programming just lists the possibilities, but can't choose which is right. needs lots of trial and error there. but in some lucky cases, it may reduce significantly the complexity of the problem.

Yoni Sidi

unread,

Nov 29, 2015, 9:47:19 AM11/29/15

to Israel R User Group

i think i can sample a starting set and work forward uses subset x[-id] recursivley. if i have slack then it restarts until it uses all cells. or will that also be unending?

Reply all

Reply to author

Forward