It seems I can no longer run garbage collection on my 0.5.4 Disco cluster.
disco@disco-master-9635:/disco/log$ grep "GC: stopping" error.log.*
error.log.0:2016-04-15 10:27:00.886 [error] <0.4386.48>@ddfs_gc_main:handle_cast:387 GC: stopping, unable to get tag <<"wexin:2015:11:20:07:16:51:bowie-node-6625">>: {error,timeout}
error.log.0:2016-04-15 13:01:02.066 [error] <0.11431.59>@ddfs_gc_main:handle_cast:387 GC: stopping, unable to get tag <<"wexin:2015:11:20:07:16:51:bowie-node-6625">>: {error,timeout}
error.log.1:2016-04-14 16:47:26.320 [error] <0.3981.29>@ddfs_gc_main:handle_cast:387 GC: stopping, unable to get tag <<"wexin:2015:11:20:07:16:51:bowie-node-6625">>: {error,timeout}
error.log.1:2016-04-14 21:04:51.049 [error] <0.8738.36>@ddfs_gc_main:handle_cast:387 GC: stopping, unable to get tag <<"wexin:2015:11:20:07:16:51:bowie-node-6625">>: {error,timeout}
error.log.1:2016-04-14 22:40:18.085 [error] <0.4831.42>@ddfs_gc_main:handle_cast:387 GC: stopping, unable to get tag <<"wexin:2015:11:20:07:16:51:bowie-node-6625">>: {error,timeout}
error.log.2:2016-04-13 19:11:44.951 [error] <0.29076.15>@ddfs_gc_main:handle_cast:387 GC: stopping, unable to get tag <<"wexin:2015:11:20:07:16:51:bowie-node-6625">>: {error,timeout}
I was imaging that the "unable to get tag" was a sign of a timeout on getting the particular tag data, but I can successfully "ddfs get" that tag from the command line. It seems significant that it is always the same tag that it fails to get. That seems to suggest something persistent (on disk) rather than node being randomly slow.
Any suggestions on how to investigate this problem would be gratefully received.