Reproducible Issue

Fan Zeng

unread,

Jun 4, 2022, 9:12:45 AM6/4/22

to OptaPlanner development

Hi

First allow me to share a success story. I am developing a timetabling solution for primary/secondary schools. There were 2 separate implementations, one in OptaPlanner and the other in Google's ortools. The OptaPlanner version is about 10 times faster with real world data. It is also more tolerant of user-defined constraints that contradict each other.

Thanks for making a great product!

I would seek your help with the reproducible issue. For the same problem data, the results are always between runs. The scores also change but less frequently. Here are the configurations for the solver.

Phases: default construction heuristic followed by step counting hill climb

moveThreadCount=AUTO

terminationConfig=bestScoreLimit of (hard, medium, soft) = (0, 0, 0)

or minuteSpentLimit of 15 minutes (never reached, all runs were under 10 minutes)

or unimprovedStepCount=10000

stepCountingHillClimbingSize = 600

acceptedCountLimit=6

swapMoveSelector.cacheType = PHASE

a custom filter on SwapMove

Planning entities or planning values are stored in Java List.

Radovan Synek

unread,

Jun 6, 2022, 6:55:35 AM6/6/22

to optapla...@googlegroups.com

Hello,

it's interesting that even the score changes (although, as you wrote, less frequently) despite the fact that you configured bestScoreLimit termination. Was the score ever worse than "0hard/0medium/0soft"?

Do you use the reproducible environment mode [1] (the default one)? Ideally, could you share the solverConfig.xml?

[1] https://www.optaplanner.org/docs/optaplanner/latest/planner-configuration/planner-configuration.html#environmentModeReproducible

Best regards,

Radek

--
You received this message because you are subscribed to the Google Groups "OptaPlanner development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optaplanner-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/optaplanner-dev/02a6be68-bc00-4fa1-be54-131bd2d41ec6n%40googlegroups.com.

--

Radovan Synek

Principal Software Engineer, Business Automation

Red Hat EMEA

Brno, Czech Republic

Fan Zeng

unread,

Jun 6, 2022, 9:19:55 PM6/6/22

to OptaPlanner development

Thank you Radovan.

The optimal (highest possible) score is 0hard/0medium/0soft. Constraints have penalties and no rewards. A negative score means there were unsatisfied constraints.

The reproducible environment mode is the default as confirmed in the log.

2022-06-06 20:53:32,270 INFO Solving started: time spent (336), best score (-3796init/0hard/0medium/0soft), environment mode (REPRODUCIBLE), move thread count (4), random (JDK with seed 0).
2022-06-06 20:53:35,241 INFO Construction Heuristic phase (0) ended: time spent (3307), best score (0hard/-6610medium/0soft), score calculation speed (29962/sec), step total (1898).
2022-06-06 20:56:38,671 INFO Local Search phase (1) ended: time spent (186737), best score (0hard/0medium/0soft), score calculation speed (21592/sec), step total (82562).
2022-06-06 20:56:38,672 INFO Solving ended: time spent (186737), best score (0hard/0medium/0soft), score calculation speed (21684/sec), phase total (2), environment mode (REPRODUCIBLE), move thread count (4).

I don't use the solverConfig.xml. Many configuration parameters are passed in from the caller. Most of them are listed in the original post. I will attach the Java class to this post.

The results are consistently the same if moveThreadCount is set to 1. Unfortunately the solver takes 3 times as long.

Collections of planning entities and planning values are stored in LinkedList.

LessonSolverConfig.java

Fan Zeng

unread,

Jun 6, 2022, 9:38:32 PM6/6/22

to OptaPlanner development

By the way I am using version 8.20.0.Final

Radovan Synek

unread,

Jun 8, 2022, 4:06:29 AM6/8/22

to optapla...@googlegroups.com

Thanks for sharing the additional information. I assume the application always runs on the same machine, thus moveThreadCount=AUTO resolves always to the same number of threads.

It count be worth comparing debug logging [1] of two runs whose results differ to see in which step their paths forked.

[1] https://www.optaplanner.org/blog/2021/05/06/ThePathToOptaPlannerEnlightenmentStartsInTheLog.html

Best regards,

Radek

To view this discussion on the web visit https://groups.google.com/d/msgid/optaplanner-dev/27f07d78-ce4a-43cf-8d02-00371072070dn%40googlegroups.com.

Fan Zeng

unread,

Jun 9, 2022, 5:51:13 PM6/9/22

to OptaPlanner development

Yes the tests were run on the same machine and the actual number of threads was 4. I will try debugging.

Fan Zeng

unread,

Jun 11, 2022, 8:55:50 AM6/11/22

to OptaPlanner development

I had the debug on. In 2 consecutive runs of the same test case, the logs diverged at LS step 231. The same move resulted in 2 different scores. Still we don't know the cause of the divergence. How can I debug further?

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (228), time spent (4338), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/12), picked move (高三06班:物理:高丽丽:12:BOTH {12, BOTH} <-> 高三06班:英语:刘国英:43:BOTH {43, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (229), time spent (4338), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/6), picked move (高三07班:语文:杨晓丽语:33:BOTH {33, BOTH} <-> 高三07班:体育:秦伟:12:BOTH {12, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (230), time spent (4349), score (0hard/-6465medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高三06班:英语:刘国英:12:BOTH {12, BOTH} <-> 高三06班:语文:周月皎:28:BOTH {28, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4349), score (0hard/-6455medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (228), time spent (4277), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/12), picked move (高三06班:物理:高丽丽:12:BOTH {12, BOTH} <-> 高三06班:英语:刘国英:43:BOTH {43, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (229), time spent (4277), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/6), picked move (高三07班:语文:杨晓丽语:33:BOTH {33, BOTH} <-> 高三07班:体育:秦伟:12:BOTH {12, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (230), time spent (4283), score (0hard/-6465medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高三06班:英语:刘国英:12:BOTH {12, BOTH} <-> 高三06班:语文:周月皎:28:BOTH {28, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4286), score (0hard/-6475medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase

Fan Zeng

unread,

Jun 11, 2022, 10:56:15 AM6/11/22

to OptaPlanner development

On a third run, the divergent point move to LS step 232, where run #2 and #3 had different moves selected

Run #2

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4286), score (0hard/-6475medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (232), time spent (4286), score (0hard/-6455medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高二01班:化学:戴拓苏:45:BOTH {45, BOTH} <-> 高二01班:生物:耿静华:51:BOTH {51, BOTH}). [main]

Run #3

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4499), score (0hard/-6475medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (232), time spent (4500), score (0hard/-6465medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高三06班:语文:周月皎:12:BOTH {12, BOTH} <-> 高三06班:数学:陈静:34:BOTH {34, BOTH}). [main]

Radovan Synek

unread,

Jun 13, 2022, 5:22:46 AM6/13/22

to optapla...@googlegroups.com

You can turn on the FULL_ASSERT environment mode to rule out score corruption, although that would likely manifest even without multi-threaded solving.

Other things to consider:

- do you use any shared collection in your domain model that is changed during score calculation?

- any possible randomness in input data (e.g. a different order of entities/values in their collections)? Again, this should likely show up without multi-threaded solving.

If nothing of the above helps to put some light on it, could you please share a minimal reproducer, i.e. a test or a simplified standalone application that shows the problem?

Regards,

Radek

To view this discussion on the web visit https://groups.google.com/d/msgid/optaplanner-dev/f7ab518d-4c4f-494a-aaa3-261c9e971bbdn%40googlegroups.com.

Geoffrey De Smet

unread,

Jun 13, 2022, 7:00:10 AM6/13/22

to optapla...@googlegroups.com

Does your domain model return a HashSet or HashMap instance for @PlanningEntityCollectionProperty or @ValueRange?
(as opposed to LinkedHashSet or LinkedHashMap)

With kind regards,
Geoffrey De Smet

To view this discussion on the web visit https://groups.google.com/d/msgid/optaplanner-dev/CAGsrUGn2J24UGo9Q5LDw7p%3DY5XX1H%3DMZDZXFf4XrMOgQUODwWg%40mail.gmail.com.

Fan Zeng

unread,

Jun 15, 2022, 2:55:50 PM6/15/22

to OptaPlanner development

Thanks Geoffrey and Radek for you suggestions. The issue was resolved.

The root cause was in a "matchWeigher" lambda in the "penalize" method. The lambda take a list, sort the elements and calculates a weight based on the sorting result. Because duplicate values may occur in the primary sorting key, the sorting may differ for the same collection. A tie breaker was added to make the sorting result deterministic and it fixed the problem

Fan

Reply all

Reply to author

Forward