Reproducible Issue

88 views
Skip to first unread message

Fan Zeng

unread,
Jun 4, 2022, 9:12:45 AM6/4/22
to OptaPlanner development
Hi

First allow me to share a success story. I am developing a timetabling solution for primary/secondary schools. There were 2 separate implementations, one in OptaPlanner and the other in Google's ortools. The OptaPlanner version is about 10 times faster with real world data. It is also more tolerant of user-defined constraints that contradict each other.

Thanks for making a great product!

I would seek your help with the reproducible issue. For the same problem data, the results are always between runs. The scores also change but less frequently. Here are the configurations for the solver.

Phases: default construction heuristic followed by step counting hill climb
moveThreadCount=AUTO
terminationConfig=bestScoreLimit of (hard, medium, soft) = (0, 0, 0) 
    or minuteSpentLimit of 15 minutes (never reached, all runs were under 10 minutes)
    or unimprovedStepCount=10000
stepCountingHillClimbingSize = 600
acceptedCountLimit=6
swapMoveSelector.cacheType = PHASE
a custom filter on SwapMove

Planning entities or planning values are stored in Java List.


Radovan Synek

unread,
Jun 6, 2022, 6:55:35 AM6/6/22
to optapla...@googlegroups.com
Hello,

it's interesting that even the score changes (although, as you wrote, less frequently) despite the fact that you configured bestScoreLimit termination. Was the score ever worse than "0hard/0medium/0soft"?

Do you use the reproducible environment mode [1] (the default one)? Ideally, could you share the solverConfig.xml?


Best regards,
Radek


--
You received this message because you are subscribed to the Google Groups "OptaPlanner development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optaplanner-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/optaplanner-dev/02a6be68-bc00-4fa1-be54-131bd2d41ec6n%40googlegroups.com.


--

Radovan Synek

Principal Software Engineer, Business Automation

Red Hat EMEA

Brno, Czech Republic

Fan Zeng

unread,
Jun 6, 2022, 9:19:55 PM6/6/22
to OptaPlanner development
Thank you Radovan.

The optimal (highest possible) score is  0hard/0medium/0soft. Constraints have penalties and no rewards. A negative score means there were unsatisfied constraints.

The reproducible environment mode is the default as confirmed in the log.

2022-06-06 20:53:32,270 INFO Solving started: time spent (336), best score (-3796init/0hard/0medium/0soft), environment mode (REPRODUCIBLE), move thread count (4), random (JDK with seed 0).
2022-06-06 20:53:35,241 INFO Construction Heuristic phase (0) ended: time spent (3307), best score (0hard/-6610medium/0soft), score calculation speed (29962/sec), step total (1898).
2022-06-06 20:56:38,671 INFO Local Search phase (1) ended: time spent (186737), best score (0hard/0medium/0soft), score calculation speed (21592/sec), step total (82562).
2022-06-06 20:56:38,672 INFO Solving ended: time spent (186737), best score (0hard/0medium/0soft), score calculation speed (21684/sec), phase total (2), environment mode (REPRODUCIBLE), move thread count (4).

I don't use the solverConfig.xml. Many configuration parameters are passed in from the caller. Most of them are listed in the original post. I will attach the Java class to this post. 

The results are consistently the same if moveThreadCount is set to 1. Unfortunately the solver takes 3 times as long. 

Collections of planning entities and planning values are stored in LinkedList. 
LessonSolverConfig.java

Fan Zeng

unread,
Jun 6, 2022, 9:38:32 PM6/6/22
to OptaPlanner development
By the way I am using version 8.20.0.Final

Radovan Synek

unread,
Jun 8, 2022, 4:06:29 AM6/8/22
to optapla...@googlegroups.com
Thanks for sharing the additional information. I assume the application always runs on the same machine, thus moveThreadCount=AUTO resolves always to the same number of threads.

It count be worth comparing debug logging [1] of two runs whose results differ to see in which step their paths forked.


Best regards,
Radek

Fan Zeng

unread,
Jun 9, 2022, 5:51:13 PM6/9/22
to OptaPlanner development
Yes the tests were run on the same machine and the actual number of threads was 4. I will try debugging.

Fan Zeng

unread,
Jun 11, 2022, 8:55:50 AM6/11/22
to OptaPlanner development
I had the debug on. In 2 consecutive runs of the same test case, the logs diverged at LS step 231. The same move resulted in 2 different scores. Still we don't know the cause of the divergence. How can I debug further?

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (228), time spent (4338), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/12), picked move (高三06班:物理:高丽丽:12:BOTH {12, BOTH} <-> 高三06班:英语:刘国英:43:BOTH {43, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (229), time spent (4338), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/6), picked move (高三07班:语文:杨晓丽语:33:BOTH {33, BOTH} <-> 高三07班:体育:秦伟:12:BOTH {12, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (230), time spent (4349), score (0hard/-6465medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高三06班:英语:刘国英:12:BOTH {12, BOTH} <-> 高三06班:语文:周月皎:28:BOTH {28, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4349), score (0hard/-6455medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (228), time spent (4277), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/12), picked move (高三06班:物理:高丽丽:12:BOTH {12, BOTH} <-> 高三06班:英语:刘国英:43:BOTH {43, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (229), time spent (4277), score (0hard/-6480medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/6), picked move (高三07班:语文:杨晓丽语:33:BOTH {33, BOTH} <-> 高三07班:体育:秦伟:12:BOTH {12, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (230), time spent (4283), score (0hard/-6465medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高三06班:英语:刘国英:12:BOTH {12, BOTH} <-> 高三06班:语文:周月皎:28:BOTH {28, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4286), score (0hard/-6475medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase

Fan Zeng

unread,
Jun 11, 2022, 10:56:15 AM6/11/22
to OptaPlanner development
On a third run, the divergent point move to LS step 232, where run #2 and #3 had different moves selected


Run #2

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4286), score (0hard/-6475medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (232), time spent (4286), score (0hard/-6455medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高二01班:化学:戴拓苏:45:BOTH {45, BOTH} <-> 高二01班:生物:耿静华:51:BOTH {51, BOTH}). [main]

Run #3

o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (231), time spent (4499), score (0hard/-6475medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/7), picked move (高二09班:生物:耿静华:21:BOTH {21, BOTH} <-> 高二09班:体育:陈文杰:44:BOTH {44, BOTH}). [main]
o.o.core.impl.localsearch.DefaultLocalSearchPhase DEBUG LS step (232), time spent (4500), score (0hard/-6465medium/0soft), best score (0hard/-6350medium/0soft), accepted/selected move count (6/10), picked move (高三06班:语文:周月皎:12:BOTH {12, BOTH} <-> 高三06班:数学:陈静:34:BOTH {34, BOTH}). [main]

Radovan Synek

unread,
Jun 13, 2022, 5:22:46 AM6/13/22
to optapla...@googlegroups.com
You can turn on the FULL_ASSERT environment mode to rule out score corruption, although that would likely manifest even without multi-threaded solving.

Other things to consider:
- do you use any shared collection in your domain model that is changed during score calculation?
- any possible randomness in input data (e.g. a different order of entities/values in their collections)? Again, this should likely show up without multi-threaded solving.

If nothing of the above helps to put some light on it, could you please share a minimal reproducer, i.e. a test or a simplified standalone application that shows the problem?

Regards,
Radek

Geoffrey De Smet

unread,
Jun 13, 2022, 7:00:10 AM6/13/22
to optapla...@googlegroups.com

Does your domain model return a HashSet or HashMap instance for @PlanningEntityCollectionProperty or @ValueRange?
(as opposed to LinkedHashSet or LinkedHashMap)

With kind regards,
Geoffrey De Smet

Fan Zeng

unread,
Jun 15, 2022, 2:55:50 PM6/15/22
to OptaPlanner development
Thanks Geoffrey and Radek for you suggestions. The issue was resolved.

The root cause was in a "matchWeigher" lambda in the "penalize" method. The lambda take a list, sort the elements and calculates a weight based on the sorting result. Because duplicate values may occur in the primary sorting key, the sorting may differ for the same collection. A tie breaker was added to make the sorting result deterministic and it fixed the problem

Fan

Reply all
Reply to author
Forward
0 new messages