Hi guys,
Previously we were having some misunderstanding about divide-and-conquer and parallelism. First of all, they are not equivalent to each other. At each level of divide-and-conquer, parallelism can be performed. But as you go up one more level, there is half less parallelism left (only half nodes to work simultaneously). At the root (or final conquer step), only one node is working.
Based on the above misunderstanding, we thought the slow-dynamic programming algorithm was easier to be paralleled (since it has a well-formed divide-and-conquer version), which was not the case. Instead, after days of thinking, I found the Floyd-Marshall algorithm was easier to be paralleled. Additionally, I already implemented the distributed version by using multiple threads to simulate multiple nodes. The result was validated. I wish I can meet with you guys some day next week to share this finding in detail. With the paralleled algorithm figured out, I think the only thing left to do is to find a better platform that allows us to migrate it onto.
Cheers~
-Dongxi