File _format_xxx has already been completed exception?

57 views
Skip to first unread message

Kaiming Wan

unread,
Oct 13, 2016, 2:46:27 AM10/13/16
to Alluxio Users, fanb...@gmail.com
Dear all,

    I stop my alluxio(ufs is hdfs) cluster recently. When I restart the alluxio cluster. The alluxio master can't startup successfully with the following log in master.log:

2016-10-13 14:29:13,042 INFO  logger.type (AbstractMaster.java:start) - FileSystemMaster: process entire journal before becoming leader master.
2016-10-13 14:29:13,042 INFO  logger.type (JournalTailer.java:processJournalCheckpoint) - FileSystemMaster: Loading checkpoint file: hdfs://ns/alluxio/journal/FileSystemMaster/checkpoint.data
2016-10-13 14:29:13,043 INFO  logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: hdfs://ns/alluxio/journal/FileSystemMaster/checkpoint.data
2016-10-13 14:29:13,214 INFO  logger.type (JournalReader.java:getNextInputStream) - Opening journal log file: hdfs://ns/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001
2016-10-13 14:29:13,215 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - FileSystemMaster: Processing a completed log file.
2016-10-13 14:29:13,218 ERROR logger.type (AlluxioMaster.java:main) - Uncaught exception while running Alluxio master, stopping it and exiting.
java
.lang.RuntimeException: alluxio.exception.FileAlreadyCompletedException: File _format_1476172432684 has already been completed.
 at alluxio
.master.file.FileSystemMaster.processJournalEntry(FileSystemMaster.java:323)
 at alluxio
.master.journal.JournalTailer.processNextJournalLogFiles(JournalTailer.java:123)
 at alluxio
.master.AbstractMaster.start(AbstractMaster.java:142)
 at alluxio
.master.file.FileSystemMaster.start(FileSystemMaster.java:401)
 at alluxio
.master.AlluxioMaster.startMasters(AlluxioMaster.java:391)
 at alluxio
.master.FaultTolerantAlluxioMaster.start(FaultTolerantAlluxioMaster.java:87)
 at alluxio
.master.AlluxioMaster.main(AlluxioMaster.java:88)
Caused by: alluxio.exception.FileAlreadyCompletedException: File _format_1476172432684 has already been completed.
 at alluxio
.master.file.meta.InodeFile.complete(InodeFile.java:236)
 at alluxio
.master.file.FileSystemMaster.completeFileInternal(FileSystemMaster.java:710)
 at alluxio
.master.file.FileSystemMaster.completeFileFromEntry(FileSystemMaster.java:734)
 at alluxio
.master.file.FileSystemMaster.processJournalEntry(FileSystemMaster.java:321)
 
... 6 more
2016-10-13 14:29:13,220 INFO  logger.type (AlluxioMaster.java:stop) - Stopping Alluxio master @ /10.8.12.16:19998
2016-10-13 14:29:13,220 ERROR logger.type (LeaderSelectorClient.java:takeLeadership) - 10.8.12.16:19998 was interrupted.
java
.lang.InterruptedException: sleep interrupted
 at java
.lang.Thread.sleep(Native Method)
 at alluxio
.LeaderSelectorClient.takeLeadership(LeaderSelectorClient.java:178)
 at org
.apache.curator.framework.recipes.leader.LeaderSelector$3.run(LeaderSelector.java:328)
 at com
.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
 at org
.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:319)
 at org
.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:376)
 at org
.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:48)
 at org
.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:197)
 at java
.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java
.lang.Thread.run(Thread.java:745)
2016-10-13 14:29:13,220 WARN  logger.type (LeaderSelectorClient.java:takeLeadership) - 10.8.12.16:19998 relinquishing leadership.




It seems that there is something wrong at file "File _format_1476172432684". After I delete this file, the log change to tell me:
2016-10-13 14:34:26,136 ERROR logger.type (AlluxioMaster.java:<init>) - Alluxio master was not formatted! The journal folder is hdfs://ns/alluxio/journal/
java
.lang.IllegalStateException: Alluxio master was not formatted! The journal folder is hdfs://ns/alluxio/journal/
 at com
.google.common.base.Preconditions.checkState(Preconditions.java:149)
 at alluxio
.master.AlluxioMaster.<init>(AlluxioMaster.java:247)
 at alluxio
.master.FaultTolerantAlluxioMaster.<init>(FaultTolerantAlluxioMaster.java:45)
 at alluxio
.master.AlluxioMaster$Factory.create(AlluxioMaster.java:206)
 at alluxio
.master.AlluxioMaster.main(AlluxioMaster.java:86)



How to fix this issue? And what does the FileAlreadyCompletedException mean?

Gene Pang

unread,
Oct 13, 2016, 11:41:26 AM10/13/16
to Alluxio Users
Hi,

Is your journal location in the same directory tree as your Alluxio UFS? I don't think the Alluxio should be managing the journal folder that is the same as the UFS. If you want to use the same HDFS cluster for both UFS and journal, please use a different directory for the journal and the UFS.

For example,

journal: hdfs://ip:port/alluxio/journal
ufs address: hdfs://ip:port/alluxio/data

Thanks,
Gene

Kaiming Wan

unread,
Oct 13, 2016, 11:29:32 PM10/13/16
to Alluxio Users
Hi, Gene Pang

Yes, my journal location is the same as the UFS. I guess many people will use the same HDFS cluster for both UFS and journal,but the default configuration "alluxio.master.journal.folder" points to the same directory tree. May be this can be designed better to avoid this issue.

Thanks Gene Pang, I will try your advice.

在 2016年10月13日星期四 UTC+8下午11:41:26,Gene Pang写道:

Kaiming Wan

unread,
Oct 13, 2016, 11:31:42 PM10/13/16
to Alluxio Users
Hi Gene Pang,

    Can you tell me what cause the exception " FileAlreadyCompletedException"? I want to know better about alluxio.

在 2016年10月13日星期四 UTC+8下午11:41:26,Gene Pang写道:
Hi,

Gene Pang

unread,
Oct 17, 2016, 9:52:27 AM10/17/16
to Alluxio Users
It means the file metadata is supposed to be completed in Alluxio, but it already exists (and is completed). Therefore, it is trying to complete again and causing a conflict. I think this is happening because the journal folder is being managed by Alluxio, which would cause problems.

If you keep the journal folder separate from the Alluxio data, it should resolve the problem.

Thanks,
Gene

Kaiming Wan

unread,
Oct 17, 2016, 10:14:54 PM10/17/16
to Alluxio Users
Thanks. The issue is solved with your advice.

在 2016年10月17日星期一 UTC+8下午9:52:27,Gene Pang写道:

Gene Pang

unread,
Oct 18, 2016, 12:09:09 AM10/18/16
to Alluxio Users
Thanks for confirming.

-Gene
Reply all
Reply to author
Forward
0 new messages