Dear all,
I intend to write a blog article about freeeed. Therefore, I donwloaded the VM with freeed v4.4 and tried to load some Enron PST files into a test case. When trying to process the project locally, I got the following error:
reeeed.main.ActionProcessing - Processing project: Enron Test
2014-10-21 14:26:17 192940 [Thread-0] INFO org.freeeed.main.MRFreeEedProcess - Running Hadoop job
2014-10-21 14:26:17 192948 [Thread-0] INFO org.freeeed.main.MRFreeEedProcess - Input project file = output/freeeed-output/1002/output//staging/inventory
2014-10-21 14:26:17 192950 [Thread-0] INFO org.freeeed.main.MRFreeEedProcess - Output path = output/freeeed-output/1002/output//results
2014-10-21 14:26:17 193520 [Thread-0] DEBUG org.freeeed.main.MRFreeEedProcess - project.isEnvHadoop() = false
2014-10-21 14:26:17 193525 [Thread-0] DEBUG org.freeeed.main.MRFreeEedProcess - Ready to run, inputPath = output/freeeed-output/1002/output//staging/inventory, outputPath = output/freeeed-output/1002/output//results
2014-10-21 14:26:18 193588 [Thread-0] TRACE org.freeeed.main.MRFreeEedProcess - Project
2014-10-21 14:26:18 193592 [Thread-0] TRACE org.freeeed.main.MRFreeEedProcess - create-pdf=false
culling=
custodian=Michael Maggie
field-separator=pipe
file-system=local
files-per-zip-staging=100
gigs-per-zip-staging=1.0
input=/mnt/evidence/RevisedEDRMv1_Complete/RevisedEDRMv1_Complete/michael_maggi/michael_maggi_000_1_1_1.pst
lucene_fs_index_enabled=true
metadata=standard
new-project-name=New project
ocr_enabled=true
ocr_max_images_per_pdf=10
output-dir-hadoop=output/freeeed-output/1002/output//results
output_dir=test-output/output
preview=true
process-where=local
project-code=1002
project-file-name=default.freeeed.properties
project-file-path=/mnt/freeed/Enron Test/NewFolder.project
project-name=Enron Test
remove-system-files=false
send_index_solr_enabled=false
skip=0
staging_dir=test-output/staging
2014-10-21 14:26:18 193680 [Thread-0] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-10-21 14:26:19 195011 [Thread-0] ERROR org.freeeed.main.ActionProcessing - Running action processing
java.lang.IllegalStateException: Input path does not exist: file:/home/ubuntu/freeeed_complete_pack/FreeEed/output/freeeed-output/1002/output/staging/inventory
at org.freeeed.main.ActionProcessing.process(ActionProcessing.java:79)
at org.freeeed.main.ActionProcessing.run(ActionProcessing.java:47)
at java.lang.Thread.run(Thread.java:744)
To fix this problem, I had to create the directory structure
/home/ubuntu/freeeed_complete_pack/FreeEed/output/freeeed-output/1002/output/staging/inventory (starting at /home/ubuntu/freeeed_complete_pack/FreeEed/output)
Rerunning the processing job, produced the following error:
2014-10-21 14:28:02 298516 [Thread-2] ERROR org.freeeed.main.ActionProcessing - Running action processing
java.lang.IllegalStateException: Input path does not exist: file:/home/ubuntu/freeeed_complete_pack/FreeEed/output/freeeed-output/1002/output/staging/inventory
at org.freeeed.main.ActionProcessing.process(ActionProcessing.java:79)
at org.freeeed.main.ActionProcessing.run(ActionProcessing.java:47)
at java.lang.Thread.run(Thread.java:744)
2014-10-21 14:33:04 600563 [Thread-3] INFO org.freeeed.main.ActionProcessing - Processing project: Enron Test
2014-10-21 14:33:05 600597 [Thread-3] INFO org.freeeed.main.MRFreeEedProcess - Running Hadoop job
2014-10-21 14:33:05 600599 [Thread-3] INFO org.freeeed.main.MRFreeEedProcess - Input project file = output/freeeed-output/1002/output//staging/inventory
2014-10-21 14:33:05 600599 [Thread-3] INFO org.freeeed.main.MRFreeEedProcess - Output path = output/freeeed-output/1002/output//results
2014-10-21 14:33:05 600600 [Thread-3] DEBUG org.freeeed.main.MRFreeEedProcess - project.isEnvHadoop() = false
2014-10-21 14:33:05 600602 [Thread-3] DEBUG org.freeeed.main.MRFreeEedProcess - Ready to run, inputPath = output/freeeed-output/1002/output//staging/inventory, outputPath = output/freeeed-output/1002/output//results
2014-10-21 14:33:05 600602 [Thread-3] TRACE org.freeeed.main.MRFreeEedProcess - Project
2014-10-21 14:33:05 600602 [Thread-3] TRACE org.freeeed.main.MRFreeEedProcess - create-pdf=false
culling=
custodian=Michael Maggie
field-separator=pipe
file-system=local
files-per-zip-staging=100
gigs-per-zip-staging=1.0
input=/mnt/evidence/RevisedEDRMv1_Complete/RevisedEDRMv1_Complete/michael_maggi/michael_maggi_000_1_1_1.pst
lucene_fs_index_enabled=true
metadata=standard
new-project-name=New project
ocr_enabled=true
ocr_max_images_per_pdf=10
output-dir-hadoop=output/freeeed-output/1002/output//results
output_dir=test-output/output
preview=true
process-where=local
project-code=1002
project-file-name=default.freeeed.properties
project-file-path=/mnt/freeed/Enron Test/NewFolder.project
project-name=Enron Test
remove-system-files=false
send_index_solr_enabled=false
skip=0
staging_dir=test-output/staging
2014-10-21 14:33:05 601553 [Thread-5] WARN org.freeeed.services.Settings - Error parsing line recent-projects=
2014-10-21 14:33:06 601593 [Thread-5] DEBUG org.freeeed.main.FreeEedConfiguration - FileName set to config/standard-metadata-names.properties
2014-10-21 14:33:06 601605 [Thread-5] DEBUG org.freeeed.main.FreeEedConfiguration - Base path set to file:///home/ubuntu/freeeed_complete_pack/FreeEed/config/standard-metadata-names.properties
2014-10-21 14:33:06 601643 [Thread-5] INFO org.freeeed.main.ZipFileWriter - Filename: output/freeeed-output/1002/output//results/native.zip, Root dir: output/freeeed-output/1002/output//results
2014-10-21 14:33:06 601835 [Thread-5] TRACE org.freeeed.main.PlatformUtil - Running command: hadoop fs -copyToLocal /output/lucene_index/1002/* output/lucene_index/tmp/
2014-10-21 14:33:06 601844 [Thread-5] WARN org.freeeed.main.PlatformUtil - Could not run the following command: hadoop fs -copyToLocal /output/lucene_index/1002/* output/lucene_index/tmp/
2014-10-21 14:33:06 601844 [Thread-5] TRACE org.freeeed.main.PlatformUtil - Running command: hadoop fs -rm /output/lucene_index/1002/*
2014-10-21 14:33:06 601844 [Thread-5] WARN org.freeeed.main.PlatformUtil - Could not run the following command: hadoop fs -rm /output/lucene_index/1002/*
2014-10-21 14:33:06 601845 [Thread-5] TRACE org.freeeed.main.Reduce - Lucene index files collected to: /home/ubuntu/freeeed_complete_pack/FreeEed/output/lucene_index/tmp
2014-10-21 14:33:09 605441 [Thread-3] INFO org.freeeed.main.ActionProcessing - Processing done
Even creating a new case and the appropriate folder structure in the output directory, did not fix the problem. A reboot of the VM did not change anything.
I have used the VM as provided on the homepage without any changes except for installing the windows guest additions.
Any help is welcome!
Best regards
Dennis