Hello everybody:
I found the issue:
https://alluxio.atlassian.net/browse/ALLUXIO-3115 and I met a same trouble.The reason I found is that journal writer is keeping open status and no chance to release the lease of the log.out.If I kill the master with the signal 9, I think this issue may occur too.when this issue occur,I will delete the log file under the completed folder.My data is persistence and the capacity of MEM tier is bigger than data,so the system in peace in the beginning.But if some data were evicted to the SSD tier and I delete the journal file again,which equals to roll back the master status,the new block id created by master may exist in block store of the worker.So can't write data and read data from UFS now.
I added some code to recover the file lease instead of deleting log file to solve the hdfs restart problem.However I have no idea about inconsistent issue.Should I add a method to find and delete these block owned by workers but their ids are absent from master.I think these data will read from ufs again and marked by new block id which is consistent with master.
Sean