Use jsprit for clustering problem

956 views
Skip to first unread message

He Huang

unread,
Aug 25, 2015, 7:47:28 AM8/25/15
to jsprit-mailing-list
Hi,

jsprit has been working well for me (thanks to the developers!), except that, when dealing with multi-vehicle VRP, I've found it usually generates a better solution if I cluster the jobs first and then run jsprit for each individual cluster as a single-vehicle VRP. For example, this figure in WHATS_NEW.md for 2015-03-12 new release v1.6 is a really big improvement from this one, but some vehicles still do a few jobs in other vehicle's cluster (e.g., route 1 and 3).

I usually solve the clustering stage by applying kmeans or modeling it as a minimum cost flow problem (if I want to have control on the size of each cluster).

My question here is:
If I set the custom objective function in jsprit as some objective function for clustering problem, for example, the within-cluster sum of squares (WCSS) as for kmeans, will jsprit be able to generate a solution for the clustering problem?

If it works, I can even take into account the time windows and other constraints in the clustering stage.

Thank you.

Best regards,
He

in...@opendoorlogistics.com

unread,
Aug 25, 2015, 8:24:25 AM8/25/15
to jsprit-ma...@googlegroups.com
>> If I set the custom objective function in jsprit as some objective function for clustering problem,
>> for example, the within-cluster sum of squares (WCSS) as for kmeans, will jsprit be able to generate a solution for the clustering problem?

I'm not sure, but you would definitely need BOTH a custom objective function and a soft constraint, as the custom objective function in jsprit is not considered during individual job insertions (but soft constraints are). 

>> modeling it as a minimum cost flow problem (if I want to have control on the size of each cluster).

Could you describe your approach to clustering using a minimum cost flow problem in more detail? Also what library do you solve it in?

Best regards

Phil

Philip Welch
Open Door Logistics Limited
Specialists in open source solutions for transport logistics.

Skype: opendoorlogistics



---- On Tue, 25 Aug 2015 12:47:28 +0100 He Huang <jie3...@gmail.com> wrote ----
--
You received this message because you are subscribed to the Google Groups "jsprit-mailing-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jsprit-mailing-...@googlegroups.com.
To post to this group, send email to jsprit-ma...@googlegroups.com.
Visit this group at http://groups.google.com/group/jsprit-mailing-list.
For more options, visit https://groups.google.com/d/optout.


He Huang

unread,
Aug 25, 2015, 10:58:50 PM8/25/15
to jsprit-mailing-list, in...@opendoorlogistics.com
Hi Dr. Welch,

Thank you very much for your reply. Please kindly find my reply below.



I'm not sure, but you would definitely need BOTH a custom objective function and a soft constraint, as the custom objective function in jsprit is not considered during individual job insertions (but soft constraints are). 


Ah, soft constraint. I guess that is the reason why my tests did not work out. Could you please kindly elaborate more on the soft constraint? I am not quite sure what kind of soft constraint I should add here. Thank you.
 
 
Could you describe your approach to clustering using a minimum cost flow problem in more detail? Also what library do you solve it in?


 I first construct a graph as follows: 
1) each job is a supply vertex with supply = 1 (demand = -1), 
2) each cluster is a demand vertex with demand = minimum size constraint for this cluster (if any, otherwise 0), 
3) there is a sink vertex with demand = total supply - sum of minimum demand across clusters, 
4) there is a directed edge from each job vertex to each cluster vertex with cost = squares of distance between job and cluster centroid, and 
5) there is a directed edge from each cluster vertex to the sink vertex with cost = 0 and capacity = maximum size constraint - minimum size constraint for this cluster. 

The figure will look like the following. Note that this graph will require that sum of minimum demand <= total supply <= sum of maximum demand. In the case where total supply > sum of maximum demand, i.e., some jobs have to be unassigned, you can add a directed edge from each job vertex (that can be unassigned) to the sink vertex with cost = penalty of unassignment. 

With the constructed graph, the clustering problem is formulated as a minimum cost flow problem, and I solved it in Python using Networkx library.


Best regards,
He

in...@opendoorlogistics.com

unread,
Aug 26, 2015, 4:49:50 AM8/26/15
to jsprit-ma...@googlegroups.com
>> Could you please kindly elaborate more on the soft constraint?

I can't find a good example for soft constraints (Stefan - do you know of one?) but see the interface SoftConstraint and its extended interfaces SoftRouteConstraint and SoftActivityConstraint. You add them to the constraintmanager when building the algorithm. See create method in https://github.com/jsprit/jsprit/blob/master/jsprit-core/src/main/java/jsprit/core/algorithm/box/Jsprit.java which adds the InsertionNoiseMaker. InsertionNoiseMaker is a SoftActivityConstraint - see https://github.com/jsprit/jsprit/blob/master/jsprit-core/src/main/java/jsprit/core/algorithm/box/InsertionNoiseMaker.java

These constraints are evaluated on each insertion and give the cost of an activity insertion, either at the route or activity (position) level. 

Thanks for the information on the clustering - very interesting. I have a couple more questions if this is OK:

- Do you use the network simplex algorithm in Networkx to solve the min cost flow?

- You already have the cluster positions (as you have the edge costs). Are you embedding this within a k-means so you (a) get the mean position per cluster, (b) assign jobs to clusters using min cost flow and then repeat this for a certain number of iterations?

- If you're using network simplex, wouldn't this sometimes split a job between two different clusters? How do you deal with this? (particular in the output results)?

Best regards

Phil

Philip Welch
Open Door Logistics Limited
Specialists in open source solutions for transport logistics.

Skype: opendoorlogistics



---- On Wed, 26 Aug 2015 03:58:50 +0100 He Huang <jie3...@gmail.com> wrote ----
--

He Huang

unread,
Aug 26, 2015, 5:41:20 AM8/26/15
to jsprit-mailing-list, in...@opendoorlogistics.com
Hi Dr. Welch,

Thanks for your reply. Please find my reply below:



I can't find a good example for soft constraints (Stefan - do you know of one?) but see the interface SoftConstraint and its extended interfaces SoftRouteConstraint and SoftActivityConstraint. You add them to the constraintmanager when building the algorithm. See create method in https://github.com/jsprit/jsprit/blob/master/jsprit-core/src/main/java/jsprit/core/algorithm/box/Jsprit.java which adds the InsertionNoiseMaker. InsertionNoiseMaker is a SoftActivityConstraint - see https://github.com/jsprit/jsprit/blob/master/jsprit-core/src/main/java/jsprit/core/algorithm/box/InsertionNoiseMaker.java


Thanks for the links. I will take a look and hope I can understand... I am currently reading AbeProblemMinMax and trying to get an idea how it works.
 
These constraints are evaluated on each insertion and give the cost of an activity insertion, either at the route or activity (position) level. 


I think for the clustering problem it'd be at the route level as the position of the jobs within a route won't matter, right?
 
Thanks for the information on the clustering - very interesting. I have a couple more questions if this is OK:

- Do you use the network simplex algorithm in Networkx to solve the min cost flow?


I think both network simplex and capacity scaling will work, and I use the latter because it is faster in my case.
 
- You already have the cluster positions (as you have the edge costs). Are you embedding this within a k-means so you (a) get the mean position per cluster, (b) assign jobs to clusters using min cost flow and then repeat this for a certain number of iterations?


Yes it is run in iterations. For every iteration, a) centroids are calculated for each cluster, b) edge costs are updated with the updated centroids, and c) clusters are generated by solving the minimum cost flow problem. It terminates when the centroids do not change any more.
 
- If you're using network simplex, wouldn't this sometimes split a job between two different clusters? How do you deal with this? (particular in the output results)?


The job size is set as 1 for all jobs (demand = -1 for all job vertices). If a job has a size larger than 1, say 2, yes it is possible that it will be split, unless there is a way to constraint the flow on each outgoing edge to be either 0 or the same as the job size. But I guess there is no easy way to do it with networkx. That is actually one drawback of the method - not able to handle job size - among others, e.g., not able to consider time windows, etc., and that is exactly why I would like to use jsprit to solve the clustering problem.

Or did you mean that, even if the job size is set as 1, it might still be split as fractions? I did not really look into the code of networkx, but this never happened in my case.

in...@opendoorlogistics.com

unread,
Aug 26, 2015, 6:41:09 AM8/26/15
to jsprit-ma...@googlegroups.com
>> I think for the clustering problem it'd be at the route level as the position of the jobs within a route won't matter, right?

Yes, however if you need to enforce a minimum route size and you include a penalty for this in the insertion cost, you might run into issues. It's difficult to model minimum route size (i.e. cluster size in your case) with an approach like jsprit which uses job insertion heuristics. Have a look at how the cheapest insertion and regret insertion codes work in jsprit...


>> Or did you mean that, even if the job size is set as 1, it might still be split as fractions? I did not really look into the code of networkx, but this never happened in my case.

Yes - but I don't know much about network flow problems, simplex and I know nothing about capacity scaling algorithms. I just know that if you're doing a linear-programming type solution, then unless you're using a fancy mixed integer programming solver (like simplex), you're solving the case where all decision variables are real valued (i.e. can take any value, not just integer values like 1). Hence getting a partial assignment (split as fractions) could be an issue. I think there are some circumstances where this is guaranteed not to happen in flow problems, but I know very little about this subject.

Best regards

Phil

Philip Welch
Open Door Logistics Limited
Specialists in open source solutions for transport logistics.

Skype: opendoorlogistics



---- On Wed, 26 Aug 2015 10:41:20 +0100 He Huang <jie3...@gmail.com> wrote ----
--

He Huang

unread,
Aug 26, 2015, 7:24:07 AM8/26/15
to jsprit-mailing-list, in...@opendoorlogistics.com
Hi Dr. Welch,

Thank you for the reply.


Yes, however if you need to enforce a minimum route size and you include a penalty for this in the insertion cost, you might run into issues. It's difficult to model minimum route size (i.e. cluster size in your case) with an approach like jsprit which uses job insertion heuristics. Have a look at how the cheapest insertion and regret insertion codes work in jsprit...


If by "cluster size" you mean the number of jobs in the cluster, it is not what I am trying to minimize here. In the clustering problem I try to minimize the within-cluster sum of squares (WCSS) as in kmeans. Cluster size (number of jobs in the cluster) is a constraint I can model with the minimum cost flow formulation but not with kmeans.

Yes - but I don't know much about network flow problems, simplex and I know nothing about capacity scaling algorithms. I just know that if you're doing a linear-programming type solution, then unless you're using a fancy mixed integer programming solver (like simplex), you're solving the case where all decision variables are real valued (i.e. can take any value, not just integer values like 1). Hence getting a partial assignment (split as fractions) could be an issue. I think there are some circumstances where this is guaranteed not to happen in flow problems, but I know very little about this subject.


I think you mean the Integrality Theorem: for network flow problems with integer data, every basic feasible solution and, in particular, every basic optimal solution assigns integer flow to every arc.

Best regards,
He

He Huang

unread,
Aug 28, 2015, 3:12:25 AM8/28/15
to jsprit-mailing-list, in...@opendoorlogistics.com
Hi,

I have added a soft constraint according to the objective function (which is basically the within-cluster sum of squares (WCSS) as for kmeans), and the tests show that the capacity constraint seems to "ruin" the cluster - I haven't tested other constraints such as time window, etc.

The clustering results are shown as follows, with and without capacity constraint, respectively:

clustering result with capacity constraint


clustering result without capacity constraint


The clustering result without capacity constraint seems fine, but that with capacity constraint is really bad.

I wonder why it is so? Any idea will be appreciated.

Stefan Schröder

unread,
Sep 14, 2015, 2:50:16 AM9/14/15
to jsprit-mailing-list, in...@opendoorlogistics.com
Hi He and Phil,

Interesting discussion. I do not know much about clustering. But if jsprit cannot find certain solution since your problem contains clusters, try to increase the probability of
Strategy.CLUSTER_BEST and Strategy.CLUSTER_REGRET like this with Jsprit.Builder.newInstance()

.setProperty(Strategy.CLUSTER_BEST, 1.0)

etc.

It uses http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/ml/clustering/DBSCANClusterer.html.

Best
Stefan
Reply all
Reply to author
Forward
0 new messages