OrientDB ETL: how to import a CSV file as Graph

1,460 views
Skip to first unread message

Luca Garulli

unread,
Dec 8, 2014, 6:28:31 PM12/8/14
to orient-database
Hi guys,
The ETL is moving fast to the final stage. For all the users interested on using it, I have created a new tutorial:


Hope more people will avoid to write code to import an existent DB into OrientDB, but rather save precious time using ETL and their own JSON configuration file.

Lvc@

Emin Agassi

unread,
Dec 8, 2014, 10:38:29 PM12/8/14
to orient-...@googlegroups.com
Hi Lvc

Can we still use the Blueprint's BatchGraph for batch loading large amount of records into the OrientDB?
Today, I am using BatchGraph to load 1+ million of Vertexes and 3+ millions of Edges in 30 minutes.
Is this still the fastest method to do this or should we use the new ETL instead?
Will the BatchGraph still work with v2.0?

Thanks

Luca Garulli

unread,
Dec 9, 2014, 4:09:09 AM12/9/14
to orient-database
Hi Emin,
I suggest you to use the new OGraphBatchInsert class available in 2.0-SNAPSHOT. It's about 10-15x faster than classic API, but only if nodes have numeric ids.

Lvc@


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Curtis Mosters

unread,
Dec 9, 2014, 4:20:02 AM12/9/14
to orient-...@googlegroups.com
Great Luca, gonna test it with CSV files and a MySQL database today and let you know if something is not properly working. =)

Curtis Mosters

unread,
Dec 9, 2014, 6:55:50 AM12/9/14
to
Not able to get it working. I always get:

C
:\Users\Mr. Kwox\Desktop\orientdb-community-2.0-M3\bin>oetl.bat etl-stuff\perso
n
.json
OrientDB etl v.2.0-M3 (build @BUILD@) www.orientechnologies.com
Exception in thread "main" java.lang.NoSuchMethodError: com.orientechnologies.or
ient
.core.record.impl.ODocument.fromJSON(Ljava/lang/String;Ljava/lang/String;)Lc
om
/orientechnologies/orient/core/record/impl/ODocument;
        at com
.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.jav
a
:147)

And I already got it working as you can see here. So what's wrong now?

Btw the json file:

{
 
"source": { "file": { "path": "/etl-stuff/person.csv" } },
 
"extractor": { "row": {} },
 
"transformers": [
   
{ "csv": {} },
   
{ "vertex": { "class": "Person" } }
 
],
 
"loader": {
   
"orientdb": {
       
"dbURL": "plocal:C:/Users/Mr. Kwox/Desktop/orientdb-community-2.0-M3/databases/test",
       
"dbType": "graph",
       
"classes": [
         
{"name": "Person", "extends": "V"},
         
{"name": "Post", "extends": "V"},
         
{"name": "HasPost", "extends": "E"}
       
], "indexes": [
         
{"class":"Person", "fields":["ID:integer"], "type":"UNIQUE" }
       
]
   
}
 
}
}

Luca Garulli

unread,
Dec 9, 2014, 9:08:28 AM12/9/14
to orient-database
Hi Curtis,
seems you don't have an updated OrientDB 2.0-SNAPSHOT linked to the ETL.

Lvc@

On 9 December 2014 at 12:55, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
Not able to get it working. I always get:

C
:\Users\Mr. Kwox\Desktop\orientdb-community-2.0-M3\bin>oetl.bat etl-stuff\perso
n
.json
OrientDB etl v.2.0-M3 (build @BUILD@) www.orientechnologies.com
Exception in thread "main" java.lang.NoSuchMethodError: com.orientechnologies.or
ient
.core.record.impl.ODocument.fromJSON(Ljava/lang/String;Ljava/lang/String;)Lc
om
/orientechnologies/orient/core/record/impl/ODocument;
        at com
.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.jav
a
:147)

And I already got it working as you can see here. So what's wrong now?

Am Dienstag, 9. Dezember 2014 10:20:02 UTC+1 schrieb Curtis Mosters:

Emin Agassi

unread,
Dec 9, 2014, 9:24:44 AM12/9/14
to orient-...@googlegroups.com
Luca,

Does this mean that using the new ETL method will not be fastest way to load millions of Vertexes and Edges into the Graph Db?

Curtis Mosters

unread,
Dec 9, 2014, 11:18:58 AM12/9/14
to orient-...@googlegroups.com
Well I downloaded OrientDB M3 and compiled it, as well as with the ETL.

So I should have the latest. Could you please tell me which one is the outdated? Thank.

Luca Garulli

unread,
Dec 9, 2014, 11:30:36 AM12/9/14
to orient-database
Hi Curtis,
I'm talking about OrientDB 2.0-SNAPSHOT, pick it from:


or compile the "develop" branch.

Lvc@

Curtis Mosters

unread,
Dec 9, 2014, 12:41:44 PM12/9/14
to orient-...@googlegroups.com
Ahh right. Always forget that I have to use the develop stage. Now working. Thanks.

Curtis Mosters

unread,
Dec 10, 2014, 1:21:34 PM12/10/14
to orient-...@googlegroups.com
Hey Luca, I found a little issue with the current MySQL example: http://www.orientechnologies.com/docs/last/orientdb-etl.wiki/Import-from-DBMS.html

If you do it like in your tutorial, it does not work (I know you did not yet rework it):

C:\Users\Mr. Kwox\Desktop\orientdb-community-2.0-SNAPSHOT\bin>oetl.bat etl-stuff
\person.json
OrientDB etl v.2.0-SNAPSHOT (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
[1:vertex] DEBUG Transformer input: {ID:1,name:Hans}
Pipeline execution halted
ETL process halted
: com.orientechnologies.orient.etl.OETLProcessHaltedException:
 
Graph instance not found. Assure you have configured it in the Loader

But if you add the not green thing, it works properly.

{
 
"config": {
   
"log": "debug"
 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",
             
"query": "select * from Person" }
 
},
 
"transformers" : [

   
{ "vertex": { "class": "Person"} }
 
],
 
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:C:/Users/Mr. Kwox/Desktop/orientdb-community-2.0-SNAPSHOT/databases/test",
     
"dbType": "graph",
     
"dbAutoCreate": true
   
}
 
}
}
Reply all
Reply to author
Forward
0 new messages