Processing fails with VM

97 views
Skip to first unread message

Dennis Pielken

unread,
Oct 21, 2014, 4:02:29 PM10/21/14
to freedi...@googlegroups.com
Dear all,

I intend to write a blog article about freeeed. Therefore, I donwloaded the VM with freeed v4.4 and tried to load some Enron PST files into a test case. When trying to process the project locally, I got the following error:

reeeed.main.ActionProcessing  - Processing project: Enron Test
2014-10-21 14:26:17 192940 [Thread-0] INFO  org.freeeed.main.MRFreeEedProcess  - Running Hadoop job
2014-10-21 14:26:17 192948 [Thread-0] INFO  org.freeeed.main.MRFreeEedProcess  - Input project file = output/freeeed-output/1002/output//staging/inventory
2014-10-21 14:26:17 192950 [Thread-0] INFO  org.freeeed.main.MRFreeEedProcess  - Output path = output/freeeed-output/1002/output//results
2014-10-21 14:26:17 193520 [Thread-0] DEBUG org.freeeed.main.MRFreeEedProcess  - project.isEnvHadoop() = false
2014-10-21 14:26:17 193525 [Thread-0] DEBUG org.freeeed.main.MRFreeEedProcess  - Ready to run, inputPath = output/freeeed-output/1002/output//staging/inventory, outputPath = output/freeeed-output/1002/output//results
2014-10-21 14:26:18 193588 [Thread-0] TRACE org.freeeed.main.MRFreeEedProcess  - Project
2014-10-21 14:26:18 193592 [Thread-0] TRACE org.freeeed.main.MRFreeEedProcess  - create-pdf=false
culling
=
custodian
=Michael Maggie
field
-separator=pipe
file
-system=local
files
-per-zip-staging=100
gigs
-per-zip-staging=1.0
input
=/mnt/evidence/RevisedEDRMv1_Complete/RevisedEDRMv1_Complete/michael_maggi/michael_maggi_000_1_1_1.pst
lucene_fs_index_enabled
=true
metadata
=standard
new-project-name=New project
ocr_enabled
=true
ocr_max_images_per_pdf
=10
output
-dir-hadoop=output/freeeed-output/1002/output//results
output_dir
=test-output/output
preview
=true
process
-where=local
project
-code=1002
project
-file-name=default.freeeed.properties
project
-file-path=/mnt/freeed/Enron Test/NewFolder.project
project
-name=Enron Test
remove
-system-files=false
send_index_solr_enabled
=false
skip
=0
staging_dir
=test-output/staging

2014-10-21 14:26:18 193680 [Thread-0] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-10-21 14:26:19 195011 [Thread-0] ERROR org.freeeed.main.ActionProcessing  - Running action processing
java
.lang.IllegalStateException: Input path does not exist: file:/home/ubuntu/freeeed_complete_pack/FreeEed/output/freeeed-output/1002/output/staging/inventory
    at org
.freeeed.main.ActionProcessing.process(ActionProcessing.java:79)
    at org
.freeeed.main.ActionProcessing.run(ActionProcessing.java:47)
    at java
.lang.Thread.run(Thread.java:744)

To fix this problem, I had to create the directory structure /home/ubuntu/freeeed_complete_pack/FreeEed/output/freeeed-output/1002/output/staging/inventory (starting at
/home/ubuntu/freeeed_complete_pack/FreeEed/output)

Rerunning the processing job, produced the following error:
2014-10-21 14:28:02 298516 [Thread-2] ERROR org.freeeed.main.ActionProcessing  - Running action processing
java
.lang.IllegalStateException: Input path does not exist: file:/home/ubuntu/freeeed_complete_pack/FreeEed/output/freeeed-output/1002/output/staging/inventory
    at org
.freeeed.main.ActionProcessing.process(ActionProcessing.java:79)
    at org
.freeeed.main.ActionProcessing.run(ActionProcessing.java:47)
    at java
.lang.Thread.run(Thread.java:744)
2014-10-21 14:33:04 600563 [Thread-3] INFO  org.freeeed.main.ActionProcessing  - Processing project: Enron Test
2014-10-21 14:33:05 600597 [Thread-3] INFO  org.freeeed.main.MRFreeEedProcess  - Running Hadoop job
2014-10-21 14:33:05 600599 [Thread-3] INFO  org.freeeed.main.MRFreeEedProcess  - Input project file = output/freeeed-output/1002/output//staging/inventory
2014-10-21 14:33:05 600599 [Thread-3] INFO  org.freeeed.main.MRFreeEedProcess  - Output path = output/freeeed-output/1002/output//results
2014-10-21 14:33:05 600600 [Thread-3] DEBUG org.freeeed.main.MRFreeEedProcess  - project.isEnvHadoop() = false
2014-10-21 14:33:05 600602 [Thread-3] DEBUG org.freeeed.main.MRFreeEedProcess  - Ready to run, inputPath = output/freeeed-output/1002/output//staging/inventory, outputPath = output/freeeed-output/1002/output//results
2014-10-21 14:33:05 600602 [Thread-3] TRACE org.freeeed.main.MRFreeEedProcess  - Project
2014-10-21 14:33:05 600602 [Thread-3] TRACE org.freeeed.main.MRFreeEedProcess  - create-pdf=false
culling
=
custodian
=Michael Maggie
field
-separator=pipe
file
-system=local
files
-per-zip-staging=100
gigs
-per-zip-staging=1.0
input
=/mnt/evidence/RevisedEDRMv1_Complete/RevisedEDRMv1_Complete/michael_maggi/michael_maggi_000_1_1_1.pst
lucene_fs_index_enabled
=true
metadata
=standard
new-project-name=New project
ocr_enabled
=true
ocr_max_images_per_pdf
=10
output
-dir-hadoop=output/freeeed-output/1002/output//results
output_dir
=test-output/output
preview
=true
process
-where=local
project
-code=1002
project
-file-name=default.freeeed.properties
project
-file-path=/mnt/freeed/Enron Test/NewFolder.project
project
-name=Enron Test
remove
-system-files=false
send_index_solr_enabled
=false
skip
=0
staging_dir
=test-output/staging

2014-10-21 14:33:05 601553 [Thread-5] WARN  org.freeeed.services.Settings  - Error parsing line recent-projects=
2014-10-21 14:33:06 601593 [Thread-5] DEBUG org.freeeed.main.FreeEedConfiguration  - FileName set to config/standard-metadata-names.properties
2014-10-21 14:33:06 601605 [Thread-5] DEBUG org.freeeed.main.FreeEedConfiguration  - Base path set to file:///home/ubuntu/freeeed_complete_pack/FreeEed/config/standard-metadata-names.properties
2014-10-21 14:33:06 601643 [Thread-5] INFO  org.freeeed.main.ZipFileWriter  - Filename: output/freeeed-output/1002/output//results/native.zip, Root dir: output/freeeed-output/1002/output//results
2014-10-21 14:33:06 601835 [Thread-5] TRACE org.freeeed.main.PlatformUtil  - Running command: hadoop fs -copyToLocal /output/lucene_index/1002/* output/lucene_index/tmp/
2014-10-21 14:33:06 601844 [Thread-5] WARN  org.freeeed.main.PlatformUtil  - Could not run the following command: hadoop fs -copyToLocal /output/lucene_index/1002/* output/lucene_index/tmp/
2014-10-21 14:33:06 601844 [Thread-5] TRACE org.freeeed.main.PlatformUtil  - Running command: hadoop fs -rm /output/lucene_index/1002/*
2014-10-21 14:33:06 601844 [Thread-5] WARN  org.freeeed.main.PlatformUtil  - Could not run the following command: hadoop fs -rm /output/lucene_index/1002/*
2014-10-21 14:33:06 601845 [Thread-5] TRACE org.freeeed.main.Reduce  - Lucene index files collected to: /home/ubuntu/freeeed_complete_pack/FreeEed/output/lucene_index/tmp
2014-10-21 14:33:09 605441 [Thread-3] INFO  org.freeeed.main.ActionProcessing  - Processing done

Even creating a new case and the appropriate folder structure in the output directory, did not fix the problem. A reboot of the VM did not change anything.

I have used the VM as provided on the homepage without any changes except for installing the windows guest additions.

Any help is welcome!

Best regards
Dennis

Mark Kerzner

unread,
Oct 21, 2014, 4:05:16 PM10/21/14
to freedi...@googlegroups.com
Dennis,

let me re-test and be back with you. Can you point me to your blog?

Thank you,
Mark

--
You received this message because you are subscribed to the Google Groups "freediscovery" group.
To unsubscribe from this group and stop receiving emails from it, send an email to freediscover...@googlegroups.com.
To post to this group, send email to freedi...@googlegroups.com.
Visit this group at http://groups.google.com/group/freediscovery.
For more options, visit https://groups.google.com/d/optout.

Dennis Pielken

unread,
Oct 21, 2014, 4:11:54 PM10/21/14
to freedi...@googlegroups.com
Hi Mark,

thank you! Just to be sure, I get an output but everything is empty (metadata file, native.zip etc.)

Regarding the blog: Sure, the address is http://batland.de - it is still building up after the last shutdown....

Best regards
Dennis
/home/ubuntu/freeeed_complete_pack/FreeEed/output)

input
=/mnt/evidence/RevisedEDRMv1_Complete/RevisedEDRMv1_Complete/michael_maggi/<span style="color:#0
...

Mark Kerzner

unread,
Oct 21, 2014, 4:19:33 PM10/21/14
to freedi...@googlegroups.com
Good, I will be back.

Mark

--

Dennis Pielken

unread,
Oct 26, 2014, 3:37:56 AM10/26/14
to freedi...@googlegroups.com
Hi Mark,

just wanted to ask if you have any ideas how to solve this problem?

Dennis
/home/ubuntu/freeeed_complete_pack/FreeEed/output)

field
-<span style="color
...

Mark Kerzner

unread,
Oct 26, 2014, 8:07:04 AM10/26/14
to freedi...@googlegroups.com
Dennis,

I am working on the next release, scheduled by Nov. 3

Mark

--
Reply all
Reply to author
Forward
0 new messages