optim

16 views
Skip to first unread message

Yoni Sidi

unread,
Nov 27, 2015, 1:36:20 PM11/27/15
to Israel R User Group
is there an optim in r that can solve this?

i have 2 numeric vectors, one long (x) and one short (x.agg). The short one is an aggregate of cells from the long vector, just i dont know which ones. the overall sum of each vector is the same.

example:

x=c(587,589,548,551,586,727,749,703,694,548,624,644,642,419,587,633,745,609,619,762,645,686,715,714,708,480,667,678,696,729,656,627,441,638,651,687,571,468,420,395)

x
.agg=c(4855,3532,1423,989,3775,2755,4082,3427)

> sum(x)
[1] 24838



amit gal

unread,
Nov 27, 2015, 1:47:18 PM11/27/15
to israel-r-...@googlegroups.com

You can model it as an integer programming optimization. There is a solver for that in R (rsymphony i think)
Suppose x has the values and g are the groups, define binary variables b_ij whether x_i belongs to group j. The constraints are straightforward, and you try to maximize the sum.

--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

amit gal

unread,
Nov 27, 2015, 6:23:44 PM11/27/15
to israel-r-...@googlegroups.com
here is a code that solves the problem. however, it is not practical to use it. integer programming is NP-complete, and even relatively small problems become intractable. probably there are better solutions here, although not asymptotically, but for practical purposes.

for example the above problem took too long to solve. I checked it on x of length 10 divided into three groups, and it worked fine.

for what it's worth here is the code:

require("lpSolve")
findpar = function(x,y) {
 
  nx = length(x)
  ng = length(y)
  vr = ng*nx
  const1 = t(sapply(1:nx,function(i) {a = rep(0,vr); a[(i-1)*ng+1:ng] =  1; a}))
  steps = 0:(nx-1)*ng
  const2 = t(sapply(1:ng,function(i) {a = rep(0,vr); a[steps+i] = x; a}))

  dir1=rep("<=",nx)
  dir2=rep("<=",ng)
  dir=c(dir1,dir2)
  const = rbind(const1,const2)
  rhs1 = rep(1,nx)
  rhs2 = y
  rhs=c(rhs1,rhs2)
  trg = rep(1,vr)

  print("start")
  a = lp(direction = "max",
         const.mat = const,
         const.dir = dir,
         const.rhs = rhs,
         objective.in = trg,
         all.bin=T)
  print("end")
  b = a$solution
  g = sapply(1:nx, function(i) which(b[((i-1)*ng+1):(i*ng)]==1))
  return(g)
}

Yoni Sidi

unread,
Nov 28, 2015, 12:07:11 AM11/28/15
to Israel R User Group
cool. thanks!

Yoni Sidi

unread,
Nov 28, 2015, 1:43:21 PM11/28/15
to Israel R User Group
i asked a huji-cs friend. he said this is indeed an np complete problem. but said there is an algo called subset sum (FPTAS/knapsack). here it is in R (its fast also)

library(adagio)
x
=c(587,589,548,551,586,727,749,703,694,548,624,644,642,419,587,633,745,609,619,762,645,686,715,714,708,480,667,678,696,729,656,627,441,638,651,687,571,468,420,395)
 
x
.agg=c(4855,3532,1423,989,3775,2755,4082,3427)
x2
=subsetsum(x,x.agg[1])

> x2
$val
[1] 4855


$inds
[1]  1  3  4  5  7 10 12 13

> sum(x[x2$inds])
[1] 4855


On Saturday, November 28, 2015 at 7:07:11 AM UTC+2, Yoni Sidi wrote:
cool. thanks!

Yoni Sidi

unread,
Nov 28, 2015, 2:05:24 PM11/28/15
to Israel R User Group
this became an interesting problem:

x2=mdply(1:1000,.fun = function(ind){
s
=sample(x = x,size = length(x),replace = F)
s
.ind=subsetsum(s,x.agg[1])[[2]]
data
.frame(i=ind,x=s[s.ind])
},.progress = "text")


hist
(x2$x)


amit gal

unread,
Nov 29, 2015, 3:54:23 AM11/29/15
to israel-r-...@googlegroups.com
note that the indices:

i=c(2,  3,  5,  7,  8, 13, 14, 19)
give you the same partial sum of 4855
and so are about 117,000 other combinations!!!
the problem is to find not a signle subset, but a complete partition.


--

amit gal

unread,
Nov 29, 2015, 4:52:01 AM11/29/15
to israel-r-...@googlegroups.com
note that subsetsum will find one solution to one sum. often there is much more than one solution, and the real problem is to find a partition that consider all the sums you have in x.agg

the code I sent you (privately, will be happy to share here if anyone is interested), finds all possible solutions to all sums. it takes more time, but then, it allows you to make better inference (e.g. if there is a sum with only one solution then it must be it, and we can safely remove solutions to other sums that uses the specific participation. or if there is an element of x that participates in all solutions for a specific sum, then we can eliminate any solution that uses this element in other sums, etc.) or for employing backtracking mechanism of trial and error.



On Sat, Nov 28, 2015 at 8:43 PM, Yoni Sidi <yon...@gmail.com> wrote:

--

Yoni Sidi

unread,
Nov 29, 2015, 8:05:14 AM11/29/15
to Israel R User Group
you are right. I'll try ur code again and pray my computer behaves better this time ;)

Yotam Hechtlinger

unread,
Nov 29, 2015, 9:14:22 AM11/29/15
to israel-r-...@googlegroups.com

Hi Amit,

In your code - are you still solving that as integer programing, or there is a more efficient way?

Yotam.

amit gal

unread,
Nov 29, 2015, 9:28:46 AM11/29/15
to israel-r-...@googlegroups.com
below is the code. the function genpos(x,y) generates all the possibilities of subsets of x to sum up to values in y. it uses a dynamic programming approach, and runs in O(length(x)*max(y)) time. it returns a list of lists of vectors.

ex = function(l,i) {
  lapply(l,function(v) c(v,i))
}

genpos = function(x,y) {
 
  nx = length(x)
  ny = max(y)
  result = list()
 
  #init
  result[[as.character(x[1])]] = list(x[1])
  #loop
  for (i in 2:nx) {
    print(i)
    tmp = list()   
    for (j in 1:ny) {
      l1 = list()
      l2 = list()
      l3 = result[[as.character(j)]]
     
      if (j==x[i]) l1 = list(x[i])
      l2 = lex(result[[as.character(j-x[i])]],x[i])
      l = c(l1,l2)
      if(length(l)>0) {
        tmp[[as.character(j)]] = l
      }
    }
    #merge tmp and result
    for (ln in names(tmp)) {
      result[[ln]] = c(result[[ln]],tmp[[ln]])
    }
  }
  return(result[as.character(y)])
}

amit gal

unread,
Nov 29, 2015, 9:40:38 AM11/29/15
to israel-r-...@googlegroups.com
should note that the integer programming solves the overall problem (if it works, practically). the dynamic programming just lists the possibilities, but can't choose which is right. needs lots of trial and error there. but in some lucky cases, it may reduce significantly the complexity of the problem.

Yoni Sidi

unread,
Nov 29, 2015, 9:47:19 AM11/29/15
to Israel R User Group
i think i can sample a starting set and work forward uses subset x[-id] recursivley. if i have slack then it restarts until it uses all cells. or will that also be unending?
Reply all
Reply to author
Forward
0 new messages