Unable to import DB-Pedia Person via ETL

112 views
Skip to first unread message

Wes Arnquist

unread,
Apr 30, 2015, 12:54:37 PM4/30/15
to orient-...@googlegroups.com
Hello!  I am trying to import DBPedia Person, per the documentation, but am running into errors.  Please see the below and let me know if you have any ideas.

Thank you!

OrientDB release? 2.0.3
(Java is 8u40)

What steps will reproduce the problem?
1. Copied JSON from this page: http://orientdb.com/docs/last/orientdb-etl.wiki/Import-from-DBPedia.html , and pasted into a new ASCII plain text document
2. made changes to match my case (whole file here with changes in bold/blue):
{
  "config": {
    "log": "debug",
    "fileDirectory": "F:/IS_DSA/DBpedia/2014/",
    "fileName": "Person.csv.gz"
  },
  "begin": [
   { "let": { "name": "$filePath",  "value": "$fileDirectory.append( $fileName )"} },
   { "let": { "name": "$className", "value": "$fileName.substring( 0, $fileName.indexOf('.') )"} }
  ],
  "source" : {
    "file": { "path": "$filePath", "lock" : true }
  },
  "extractor" : {
    "row": {}
  },
  "transformers" : [
   { "csv": { "separator": ",", "nullValue": "NULL", "skipFrom": 1, "skipTo": 3 } },
   { "merge": { "joinFieldName":"URI", "lookup":"V.URI" } },
   { "vertex": { "class": "$className"} }
  ],
  "loader" : {
    "orientdb": {
      "dbURL": "remote:localhost/database/dbpedia",
      "dbUser": "tester",
      "dbPassword": "tester",
      "dbAutoCreate": true,
      "tx": false,
      "batchCommit": 1000,
      "dbType": "graph",
      "indexes": [{"class":"V", "fields":["URI:string"], "type":"UNIQUE" }]
    }
  }
}
3. saved as ASCII encoded JSON file "import_dbpedia.json"
4. (in Windows) started Server.bat
5. opened cmd, cd to bin directory, then ran "oetl.bat ../import_dbpedia.json"
6. received this error message:  (what I think is strange is the part in bold/red.  The JSON contains $fileName, not $fileNam.  I tried changing this to other values of different lengths - every time it cuts off the last character of the variable.
F:\IS_DSA\Graph DBs\orientdb-community-2.0.3\bin>oetl.bat ..\import_dbpedia.json
OrientDB etl v.2.0.3 (build @BUILD@) www.orientechnologies.com
Exception in thread "main" com.orientechnologies.orient.core.exception.OConfigurationException: Error on creating ETL processor
        at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:278)
        at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:188)
        at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:163)
Caused by: java.lang.NumberFormatException: For input string: "$fileNam"
        at java.lang.NumberFormatException.forInputString(Unknown Source)
        at java.lang.Integer.parseInt(Unknown Source)
        at java.lang.Integer.parseInt(Unknown Source)
        at com.orientechnologies.orient.core.record.impl.ODocumentHelper.evaluateFunction(ODocumentHelper.java:742)
        at com.orientechnologies.orient.core.record.impl.ODocumentHelper.getFieldValue(ODocumentHelper.java:481)
        at com.orientechnologies.orient.core.command.OBasicCommandContext.getVariable(OBasicCommandContext.java:121)
        at com.orientechnologies.orient.core.command.OBasicCommandContext.getVariable(OBasicCommandContext.java:57)
        at com.orientechnologies.orient.etl.OAbstractETLComponent.resolve(OAbstractETLComponent.java:130)
        at com.orientechnologies.orient.etl.block.OLetBlock.executeBlock(OLetBlock.java:59)
        at com.orientechnologies.orient.etl.block.OAbstractBlock.execute(OAbstractBlock.java:31)
        at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:231)
        ... 2 more

If you're using custom settings please provide them below (to dump all the settings run the application using the JVM argument -Denvironment.dumpCfgAtStartup=true):
Not using custom settings.

What is the expected output? What do you see instead?
I expected the ETL program to import the DB-Pedia Person data into my "dbpedia" database.
Instead I received an error message mentioning a truncated variable name - $fileNam instead of $fileName:
Exception in thread "main" com.orientechnologies.orient.core.exception.OConfigurationException: Error on creating ETL processor
...
java.lang.NumberFormatException: For input string: "$fileNam"


Thank you!

Wes Arnquist

unread,
May 4, 2015, 4:41:24 PM5/4/15
to orient-...@googlegroups.com
Hi again - an update of things I've tried since posting:
1. I tried repointing JAVA_HOME to the 64 bit version of Java. This did not work.
2. Tried completely uninstalling Java and reinstalling the 32bit JDK.  I am now on version 8u45.  My JRE did not include a server folder, so I was told to copy it in from the JDK JRE folder.  Could this be the problem? (I did this before as well)  It still did not work and received the same error.  If this is not an appropriate thing to do, how should I point Orient to look at the original server folder that's in the JDK?
3. I tried changing the "Person.csv.gz" file to "Personcsv.gz", thinking maybe somehow the double period was tripping up the parser. That did not work.
4. I tried running the JSON through JSON Lint, but that didn't yield any problems.

I'm out of ideas...  I'd really like to try out the ETL tool - is there some prep work I need to do before I can use it?

Thank you

Wes Arnquist

unread,
May 7, 2015, 6:38:20 PM5/7/15
to orient-...@googlegroups.com
Are there any other OrientDB communities that might be able to help me?
Thank you

Jerry Kurian

unread,
Dec 7, 2019, 9:21:22 AM12/7/19
to OrientDB
Were you able to resolve this issue?
Reply all
Reply to author
Forward
0 new messages