ETL for professional import to OrientDB

1 661 visningar
Hoppa till det första olästa meddelandet

Curtis Mosters

oläst,
21 aug. 2014 07:05:512014-08-21
till
Hey so I have now some experiences with ETL. I think this is great for import. That's why I want to improve the importing process now.
  1. First of all is it possible to run several imports parallel(not with several consoles) or if one is done start another instantly?
  2. How do I create an edge (https://github.com/orientechnologies/orientdb-etl/wiki/Transformer#edge is not helpful for me)

Let's say I have 2 tables in MySQL:

table1 with {name_id,name} and table2 with {name_id,text_id,text}

Now I want to get them connected (name_id) in OrientDB.

One way would now be to import both tables as Class. That's easy and then create edges with the name_id. Everything is fine with that, but it would be much easier to generate edges with ETL if that is anyway possible. So I don't know what is more efficient. Someone else also said that Sails (https://github.com/vjsrinath/sails-orientdb) might be a good choice.

If ETL allows that I could imagine about the following way:

  • read line of table1 -> safe in memory
  • look in table2 if name_id exists
    • -> if yes, create vertex Tabel1 and Table2 and create an edge
    • -> if no, create vertex Tabel1 and Table2
  • and so on

Of course this is very crazy if you imagine 30 tables or so. Was just an idea, just to check if that is already possible somehow.

Luca Garulli

oläst,
21 aug. 2014 07:25:472014-08-21
till orient-database
Hi Curtis,
ETL is very powerful and we're improving it everyday with users' feedback. We want to let it to be final for 2.0, so we can bundle with it. By the way Enterprise Edition will have a Web Interface to edit it, and in the future also to debug it.

Unfortunately I don't understand in your example table1 & 2 with both name_id field. Look at this example about Friendship. I want to connect a social network like app. 

This is an example of ETL configuration. My comments begins with // but remove them from real config file:

{
  config: {
    verbose: true
  },
  begin: [
  ],
  extractor: {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
    "url": "jdbc:mysql://localhost/mysocialnetwork",
    "userName": "root",
    "userPassword": "root",
    "query": "select * from profile"
   }
  },
  transformers : [
    {
      // THIS IS ONLY A LOG, USEFUL TO TRACE WHAT ARRIVES FROM MYSQL
      log: {
        prefix: "MySQL -> "
      }
    },
    {
      merge: {
        // LOOKUP BY ID: IF ALREADY PRESENT MERGE THE FIELDS BY OVERWRITING THE DIFFERENT ONES
        joinFieldName: "id",
        lookup: "Profile.id"
      }
    },
    {
      vertex: {
        // TRANSFORM IT IN A VERTEX
        class: "Profile"
      }
    },
    {
      edge: {
        // CONNECT THE FRIEND IN MYSQL COLUMN 'friend_id'
        class: "Friend", // Friend is the edges' class
        joinFieldName: "friend_id", // 
        lookup: "Profile.id",
        unresolvedLinkAction: "CREATE", // IF DOESN'T EXIST YET, CREATE THE VERTEX EMPTY. WILL BE FURTHER UPDATED
        if: "friend_id is not null"
      }
    },
    {
      log: {
        prefix: "Vertex -> "
      }
    }
  ],
  loader : {
    orientdb: {
      dbURL: "remote:localhost/yourdb",
      dbUser: "admin",
      dbPassword: "admin",
      dbAutoCreate: true,
      tx: true,
      batchCommit: 1000,
      dbType: "graph",
      classes: [
      ],
      indexes: [
      ]
    }
  }
}


Lvc@



On 21 August 2014 13:05, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
Hey so I have now some experiences with ETL. I think this is great for import. That's why I want to improve the importing process now.
  1. First of all is it possible to run several imports parallel(not with several consoles) or if one is done start another instantly?
  2. How do I create an edge (https://github.com/orientechnologies/orientdb-etl/wiki/Transformer#edge is not helpful for me)

Let's say I have 2 tables in MySQL:

table1 with {name_id,name} and table2 with {name_id,text_id,text}

Now I want to get them connected (name_id) in OrientDB.

One way would now be to import both tables as Class. That's easy and then create edges with the name_id. Everything is fine with that, but it would be much easier to generate edges with ETL if that is anyway possible. So I don't know what is more efficient. Someone else also said that Sails (https://github.com/vjsrinath/sails-orientdb) might be a good choice.

If ETL allows that I could imagine about the following way:

  • read line of table1 -> safe in memory
  • look in table2 if name_id exists
    • -> if yes, create vertex Tabel1 and Table2
    • -> if no, create vertex Tabel1 and Table2 and create an edge
  • and so on

Of course this is very crazy if you imagine 30 tables or so. Was just an idea, just to check if that is already possible somehow.

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Curtis Mosters

oläst,
21 aug. 2014 08:45:252014-08-21
till orient-...@googlegroups.com
This is a great example Luca, thanks first of all.

But could you additionally add information about the MySQL schema and do I understand it correctly that you get the classes Person,Friend and an edge. But where comes the friend data?

Luca Garulli

oläst,
21 aug. 2014 09:56:052014-08-21
till orient-database
On 21 August 2014 14:45, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
This is a great example Luca, thanks first of all.

But could you additionally add information about the MySQL schema and do I understand it correctly that you get the classes Person,Friend and an edge. But where comes the friend data? 


Friend in my case is an Edge class. (create class Friend extends E). It's useful to create ad-hoc edge classes instead that the generic "E":


Lvc@

 

Curtis Mosters

oläst,
25 aug. 2014 06:26:002014-08-25
till orient-...@googlegroups.com
Hi Luca, well your example is just reading one table from MySQL. One table with creating edges on it's entries. But in the usual cases that's not the way how to import from a MySQL. Or maybe I just didn't understood that example completely. But so far as I have seen it, you used one table from MySQL that contained {id,...,friend_id}. You created profiles on that id and then created edge on that id's. Is that right?

But just once again I have those data as usually stored in another table. So that cannot work for me.

Luca Garulli

oläst,
25 aug. 2014 07:23:552014-08-25
till orient-database
Curtis,
You could create 2 or more etl scripts, some that extract vertices and other that extract edges.

Lvc@

Curtis Mosters

oläst,
25 aug. 2014 08:15:002014-08-25
till orient-...@googlegroups.com
Could you maybe take some time to create such an example?

The current "edge" is bit hard to understand:

edge: {
       
class: "hasPost",
        joinFieldName
: "person_id",
        lookup
: "person.id",
        unresolvedLinkAction
: "CREATE",
       
if: "person_id is not null"
     
}

How can I say here to Link from person on post? I just need a small example and also the MySQL strcuture if possible. That would make everything clear. Thanks Luca.

Curtis Mosters

oläst,
25 aug. 2014 09:23:142014-08-25
till orient-...@googlegroups.com
Well there is an idea that might be easy to be implemented.

So the idea is to auto-create an edge. Let's take an example:

MySQL classes are:
  • Person{id,name}
  • Post{,person_id,title}

Now we also create such classes in OrientDB. Now we tell OrientDB if there is INSERT INTO Person or Post -> create edge on "id" to "person_id". I don't but isn't that pretty simple for a database to do?

Well I have to say all of that is possible easily done with Java API but I really want it with ETL because there is no real need for me to use the Java API.

Luca Garulli

oläst,
25 aug. 2014 10:11:322014-08-25
till orient-database
Hi Curtis,
getting your structure on MySQL:
  • Person{id,name}
  • Post{person_id,title}
You could configure 2 pipelines:
  1. one with extraction of "select * from Person"
  2. one with extraction of "select * from Post"

Example for (1)

{
  "config": {
    "verbose": true

  },
  "extractor" : {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
              "url": "jdbc:mysql://localhost/mysql",
              "userName": "root",
              "userPassword": "",
              "query": "select * from Person" }
  },
  "transformers" : [
   { "vertex": { "class": "Person"} }
  ],
  "loader" : {
    "orientdb": {
      "dbURL": "plocal:/temp/databases/orientdb",
      "dbAutoCreate": true
    }
  }
}

And this for (2):

{
  "config": {
    "verbose": true

  },
  "extractor" : {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
              "url": "jdbc:mysql://localhost/mysql",
              "userName": "root",
              "userPassword": "",
              "query": "select * from Post" }
  },
  "transformers" : [
   { "vertex": { "class": "Post"} },
   { "edge": { "class": "Person", "direction" : "in", 
            "joinFieldName": "person_id",
            "lookup":"Person.id", "unresolvedLinkAction":"CREATE"} }
  ],
  "loader" : {
    "orientdb": {
      "dbURL": "plocal:/temp/databases/orientdb",
      "dbAutoCreate": true
    }
  }
}


Lvc@

Curtis Mosters

oläst,
25 aug. 2014 10:40:402014-08-25
till orient-...@googlegroups.com
Thanks Luca, I tried it out. The Person import works. But the more complex vetrex and edge thing does not.

And with...
{ "edge": { "class": "Person", "
...
I get
Exception in thread "main" java.lang.IllegalArgumentException: Type error. The class Person does not extend class 'E' and therefore cannot be considered an Edge

So I thought it was just a mistake. I changed it to a Class E called "hasPost". And now I get:

Exception in thread "main" java.lang.IllegalArgumentException: Property value ca
n
not be null
        at com
.tinkerpop.blueprints.util.ExceptionFactory.propertyValueCanNotBeN
ull
(ExceptionFactory.java:60)
        at com
.tinkerpop.blueprints.impls.orient.OrientElement.validateProperty(
OrientElement.java:423)
        at com
.tinkerpop.blueprints.impls.orient.OrientElement.setProperty(Orien
tElement
.java:139)
        at com
.orientechnologies.orient.etl.transformer.OEdgeTransformer.execute
Transform(OEdgeTransformer.java:102)
        at com
.orientechnologies.orient.etl.transformer.OAbstractTransformer.tra
nsform
(OAbstractTransformer.java:37)
        at com
.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.ja
va
:97)
        at com
.orientechnologies.orient.etl.OETLProcessor.executeSequentially(OE
TLProcessor.java:459)
        at com
.orientechnologies.orient.etl.OETLProcessor.execute(OETLProcessor.
java
:253)
        at com
.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.jav
a
:221)
+ extracted 1 records (0 records/sec) - 1 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1000ms [0 warnings, 0 errors]
+ extracted 1 records (0 records/sec) - 1 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2s [0 warnings, 0 errors]

Luca Garulli

oläst,
25 aug. 2014 10:54:162014-08-25
till orient-database
Sorry, create a new Edge's class: "Wrote":

create class Wrote extends E

And put "Wrote" in the Edge's configuration.

Lvc@

Curtis Mosters

oläst,
25 aug. 2014 11:52:532014-08-25
till
No problem Luca, but still the same problem. Here are my JSON's. Just in case I have something missing there. I braked it down to the things I really need:

Person ETL:
{
 
"config": {
   
"verbose": true

 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",

             
"url": "jdbc:mysql://localhost/test",

             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from person" }

 
},
 
"transformers" : [
   
{ "vertex": { "class": "Person"} }
 
],
   
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test5",
     
"tx": true,
     
"batchCommit": 5000,
     
"wal" : true,
     
"dbType": "graph"
   
}
 
}
}


Post and edge Wrote ETL:
{
 
"config": {
   
"verbose": true

 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",

             
"url": "jdbc:mysql://localhost/test",

             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from post" }

 
},
 
"transformers" : [
   
{ "vertex": { "class": "Post"} },

   
{ "edge": { "class": "Wrote", "direction" : "in",
           
"joinFieldName": "person_id",

           
"lookup":"Person.id", "unresolvedLinkAction":"CREATE"} }
 
],
 
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test5",
     
"tx": true,
     
"batchCommit": 5000,
     
"wal" : true,
     
"dbType": "graph"
   
}
 
}
}

And Luca, no the Edge problem was already fixed be myself =)

As already said, this is what Im no able to fix now =/

Luca Garulli

oläst,
25 aug. 2014 12:07:012014-08-25
till orient-database
What's the error?


On 25 August 2014 17:52, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
No problem Luca, but still the same problem. Here are my JSON's. Just in case I have something missing there. I braked it down to the things I really need:

Person ETL:
{

 
"config": {
   
"verbose": true

 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",

             
"url": "jdbc:mysql://localhost/test",

             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from person" }

 
},
 
"transformers" : [
   
{ "vertex": { "class": "Person"} }
 
],
   
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test5",

     
"tx": true,
     
"batchCommit": 5000,
     
"wal" : true,
     
"dbType": "graph"
   
}
 
}
}


Post and edge Wrote ETL:
{

 
"config": {
   
"verbose": true

 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",

             
"url": "jdbc:mysql://localhost/test",

             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from post" }

 
},
 
"transformers" : [
   
{ "vertex": { "class": "Post"} },

   
{ "edge": { "class": "Wrote", "direction" : "in",
           
"joinFieldName": "person_id",

           
"lookup":"Person.id", "unresolvedLinkAction":"CREATE"} }
 
],
 
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test5",

     
"tx": true,
     
"batchCommit": 5000,
     
"wal" : true,
     
"dbType": "graph"
   
}
 
}
}

And Luca, no the Edge problem was already fixed be meself =)

not it's this here take I can't handle to fix =/


Exception in thread "main" java.lang.IllegalArgumentException: Property value can not be null




Curtis Mosters

oläst,
25 aug. 2014 12:12:222014-08-25
till orient-...@googlegroups.com
Luca, I really don't know =/. Here is a picture, maybe more helpful:

http://i.imgur.com/f9QJvM8.png

Luca Garulli

oläst,
25 aug. 2014 12:16:432014-08-25
till orient-database
No problem,
the person id was null. Change the edge block to treat with a WARNING the case:

{ "edge": { "class": "Wrote", "direction" : "in", 
            
"joinFieldName": "person_id",
            "lookup":"Person.id", "unresolvedLinkAction":"WARNING"} }

Lvc@ 

Curtis Mosters

oläst,
25 aug. 2014 12:37:582014-08-25
till
Thanks Luca, the problem was actually not the ETL. It was that I mixed up the column name id and person_id in MySQL. Simply switched that names and it runs now.

But somehow no Edge is created. I don't get a warning or something like that. It just seems that he does not find an id to join. But as you can see from the example above I have same ID in that classes. So what else could I try?

Curtis Mosters

oläst,
25 aug. 2014 15:18:562014-08-25
till
Well I tried some hours to get this running. Even studying http://www.orientechnologies.com/docs/1.7.8/orientdb-etl.wiki/Transformer.html#edge but nothing new to me.

Somehow the edges cannot be created with ETL. I'm using orientdb-etl-0.9.jar. Should I maybe use another version?

Edit: also tried orientdb-etl-0.9.2-SNAPSHOT.jar, still not working =/

Edit2: also tried exactly the same example of "Author" and "Post" on http://www.orientechnologies.com/docs/last/orientdb.wiki/Transactions.html But still no success there =/

Curtis Mosters

oläst,
25 aug. 2014 16:45:472014-08-25
till orient-...@googlegroups.com
Alright I just want to give an update. So I have the Author ETL:

{
 
"config": {
   
"verbose": true
 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from Author" }
 
},
 
"transformers" : [
   
{ "vertex": { "class": "Author"} }
 
],
   
"loader" : {
   
"orientdb": {
     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test6",
     
"dbAutoCreate": true
   
}
 
}
}

The database does not exist. Maybe that's the reason why it's not working for me. I always runned my example with an existing database. I dunno. But here is the error I get when running that JSON:

D:\Eclipse LaTeX\DB - orientdb\bin>oetl.bat backup\luca\Author.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
ETL process halted
: com.orientechnologies.orient.etl.OETLProcessHaltedException:
 java
.lang.NullPointerException

And just when I add:

"dbType": "graph"

to the "loader" I get it running. Is that right, but why is that working for you Lucan and not for me? =/


Luca Garulli

oläst,
25 aug. 2014 16:59:582014-08-25
till orient-database
Hi Curtis,

I don't know why you change from Person to Author, maybe this is the problem?

However, use last release 0.9.2-SNAPSHOT. About the loader you can setup more params. For example it's important for performance reason that you set an index against Person.id. I've added also a log before and after transformation. Try running the following:

{
  "config": {
    "verbose": true
  },
  "extractor" : {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
              "url": "jdbc:mysql://localhost/test",
              "userName": "root",
              "userPassword": "",
              "query": "select * from Author" }
  },
  "transformers" : [
   { log : { prefix: "BEFORE->" } },
   { "vertex": { "class": "Author"} },
   { log : { prefix: "AFTER->" } }

  ],
   "loader" : {
    "orientdb": {
      dbURL: "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test6",
      dbUser: "admin",
      dbPassword: "admin",
      dbAutoDropIfExists: false,
      dbAutoCreate: true,
      tx: false,
      wal: false,
      batchCommit: 1000,
      dbType: "graph",
      indexes: [{class:"Person", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
    }
  }
}

Lvc@



Curtis Mosters

oläst,
26 aug. 2014 04:17:062014-08-26
till
Switching from Person to Author was just to test your Tutorial as exactly as possible. I now added some logs. There you can see that there is no warning but also no edge is created:

Person ETL:

{
 
"config": {
   
"verbose": true
 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",
             
"query": "select * from Author" }
 
},
 
"transformers" : [

   
{ log : { prefix: "BEFORE->" } },
   
{ "vertex": { "class": "Author"} },
   
{ log : { prefix: "AFTER->" } }
 
],
   
"loader" : {
   
"orientdb": {

      dbURL
: "plocal:C:\Users\kwoxer\Desktop\DB - orientdb\databases\Test",
      dbUser
: "root",
      dbPassword
: "root",

      dbAutoDropIfExists
: false,
      dbAutoCreate
: true,
      tx
: false,
      wal
: false,
      batchCommit
: 1000,
      dbType
: "graph",

      indexes
: [{class:"Author", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
   
}
 
}
}

The results in command line:

C:\Users\kwoxer\Desktop\DB - orientdb\bin>oetl.bat backup\luca\person.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1015ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2015ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 3016ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 4016ms [0 warnings, 0 errors]
BEFORE
->{id:1,name:wda}
AFTER
->v(Author)[#11:0]
BEFORE
->{id:2,name:fwf}
AFTER
->v(Author)[#11:1]
END ETL PROCESSOR
+ extracted 3 records (7 records/sec) - 3 records -> loaded 2 vertices (4 vertic
es
/sec) Total time: 4427ms [0 warnings, 0 errors]

Post/Edge ETL:

{
 
"config": {
   
"verbose": true
 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from Post" }
 
},
 
"transformers" : [
   
{ log : { prefix: "LOG 1->" } },
   
{ "vertex": { "class": "Post"} },
   
{ log : { prefix: "LOG 2->" } },

   
{ "edge": { "class": "Wrote", "direction" : "in",
           
"joinFieldName": "person_id",

           
"lookup":"Author.id", "unresolvedLinkAction":"WARNING"} },
   
{ log : { prefix: "LOG 3->" } }
 
],
   
"loader" : {
   
"orientdb": {
      dbURL
: "plocal:C:\Users\kwoxer\Desktop\DB - orientdb\databases\Test",
      dbUser
: "root",
      dbPassword
: "root",

      dbAutoDropIfExists
: false,
      dbAutoCreate
: true,
      tx
: false,
      wal
: false,
      batchCommit
: 1000,
      dbType
: "graph",

      indexes
: [{class:"Post", fields:["person_id:string"], type:"UNIQUE_HASH_INDEX" }]
   
}
 
}
}

Here the command line output:

C:\Users\kwoxer\Desktop\DB - orientdb\bin>oetl.bat backup\luca\postedge.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1000ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2001ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 3001ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 4001ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 5001ms [0 warnings, 0 errors]
LOG
1->{person_id:1,title:wadwd}
LOG
2->v(Post)[#12:0]
LOG
3->v(Post)[#12:0]
LOG
1->{person_id:2,title:fawf}
LOG
2->v(Post)[#12:1]
LOG
3->v(Post)[#12:1]
END ETL PROCESSOR
+ extracted 3 records (8 records/sec) - 3 records -> loaded 2 vertices (5 vertic
es
/sec) Total time: 5356ms [0 warnings, 0 errors]

I don't know why it is working for you. Did you already try one ETL of mine? Maybe you have some other config? Maybe something "oetl.bat" is not able to do, I think you are running a Linux when testing, don't you?

Maybe also interesting: if I change the "joinFieldName" to something that is not in MySQL I see that he is actually doing something with edges. Here is the command line output I get then:

C:\Users\kwoxer\Desktop\DB - orientdb\bin>oetl.bat backup\luca\postedge.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1010ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2026ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 3026ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 4057ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 5058ms [0 warnings, 0 errors]
LOG
1->{person_id:1,title:wadwd}
LOG
2->v(Post)[#12:0]
edge
->edge: WARN Cannot resolve join for value 'null'
LOG
3->v(Post)[#12:0]
LOG
1->{person_id:2,title:fawf}
LOG
2->v(Post)[#12:1]
edge
->edge: WARN Cannot resolve join for value 'null'
LOG
3->v(Post)[#12:1]
END ETL PROCESSOR
+ extracted 3 records (11 records/sec) - 3 records -> loaded 2 vertices (7 verti
ces
/sec) Total time: 5328ms [2 warnings, 0 errors]

Curtis Mosters

oläst,
27 aug. 2014 10:04:352014-08-27
till orient-...@googlegroups.com
Luca, could you also if you can't help me just give a statement. Just say it does not work currently or you testing it or anything.

Just that I know this problem gets a solution because currently it's unusable for Windows users =/

Thank you.


Am Dienstag, 26. August 2014 10:17:06 UTC+2 schrieb Curtis Mosters:
Switching from Person to Author was just to test your Tutorial as exactly as possible. I now added some logs. There you can see that there is no warning but also no edge is created:

Person ETL:

{
 
"config": {
   
"verbose": true
 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",
             
"query": "select * from Author" }
 
},
 
"transformers" : [

   
{ log : { prefix: "BEFORE->" } },
   
{ "vertex": { "class": "Author"} },
   
{ log : { prefix: "AFTER->" } }
 
],
   
"loader" : {
   
"orientdb": {

      dbURL
: "plocal:C:\Users\kwoxer\Desktop\DB - orientdb\databases\Test",
      dbUser
: "root",

      dbPassword
: "root",

      dbAutoDropIfExists
: false,
      dbAutoCreate
: true,
      tx
: false,
      wal
: false,
      batchCommit
: 1000,
      dbType
: "graph",

      indexes
: [{class:"Author", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
   
}
 
}
}

The results in command line:

C:\Users\kwoxer\Desktop\DB - orientdb\bin>oetl.bat backup\luca\person.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1015ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2015ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 3016ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 4016ms [0 warnings, 0 errors]
BEFORE
->{id:1,name:wda}
AFTER
->v(Author)[#11:0]
BEFORE
->{id:2,name:fwf}
AFTER
->v(Author)[#11:1]
END ETL PROCESSOR
+ extracted 3 records (7 records/sec) - 3 records -> loaded 2 vertices (4 vertic
es
/sec) Total time: 4427ms [0 warnings, 0 errors]

Post/Edge ETL:

{
 
"config": {
   
"verbose": true
 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",

             
"query": "select * from Post" }
 
},
 
"transformers" : [
   
{ log : { prefix: "LOG 1->" } },
   
{ "vertex": { "class": "Post"} },

   
{ log : { prefix: "LOG 2->" } },

   
{ "edge": { "class": "Wrote", "direction" : "in",
           
"joinFieldName": "person_id",

           
"lookup":"Author.id", "unresolvedLinkAction":"WARNING"} },
   
{ log : { prefix: "LOG 3->" } }

 
],
   
"loader" : {
   
"orientdb": {

      dbURL
: "plocal:C:\Users\kwoxer\Desktop\DB - orientdb\databases\Test",
      dbUser
: "root",

      dbPassword
: "root",

      dbAutoDropIfExists
: false,
      dbAutoCreate
: true,
      tx
: false,
      wal
: false,
      batchCommit
: 1000,
      dbType
: "graph",

      indexes
: [{class:"Post", fields:["person_id:string"], type:"UNIQUE_HASH_INDEX" }]
   
}
 
}
}

Here the command line output:

C:\Users\kwoxer\Desktop\DB - orientdb\bin>oetl.bat backup\luca\postedge.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1000ms [0 warnings, 0 errors]

+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2001ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 3001ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 4001ms [0 warnings, 0 errors]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 5001ms [0 warnings, 0 errors]
LOG
1->{person_id:1,title:wadwd}
LOG
2->v(Post)[#12:0]
LOG
3->v(Post)[#12:0]
LOG
1->{person_id:2,title:fawf}
LOG
2->v(Post)[#12:1]
LOG
3->v(Post)[#12:1]
END ETL PROCESSOR
+ extracted 3 records (8 records/sec) - 3 records -> loaded 2 vertices (5 vertic
es
/sec) Total time: 5356ms [0 warnings, 0 errors]

I don't know why it is working for you. Did you already try one ETL of mine? Maybe you have some other config? Maybe something "oetl.bat" is n...

Luca Garulli

oläst,
27 aug. 2014 18:27:232014-08-27
till orient-database
Hi Curtis,
if you get last snapshot I've introduced the log level. Remove "verbose" (deprecated) and set: "log": "debug" in config:

{
  
"config": {
    
"log": "true"
  
},

Then please send me the logs.

Lvc@



--

Curtis Mosters

oläst,
28 aug. 2014 04:14:382014-08-28
till orient-...@googlegroups.com
Just to make sure:

"log": "debug"
or
"log": "true"
?

LOG
2->v(Post)[<span
...

Luca Garulli

oläst,
28 aug. 2014 05:37:152014-08-28
till orient-database
Ops, sorry:

{ log : "debug"  }

Lvc@



--

Curtis Mosters

oläst,
28 aug. 2014 06:14:192014-08-28
till orient-...@googlegroups.com
Well not more infos in the command. Here again my updated JSON:

Author ETL:

{
 
"config": {
   
"log": "debug"

 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",
             
"query": "select * from Author" }
 
},
 
"transformers" : [
   
{ log : { prefix: "BEFORE->" } },
   
{ "vertex": { "class": "Author"} },
   
{ log : { prefix: "AFTER->" } }
 
],
   
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test6",

      dbUser
: "root",
      dbPassword
: "root",
      dbAutoDropIfExists
: false,
      dbAutoCreate
: true,
      tx
: false,
      wal
: false,
      batchCommit
: 1000,
      dbType
: "graph",
      indexes
: [{class:"Author", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
   
}
 
}
}



and output:

D:\Eclipse LaTeX\DB - orientdb\bin>oetl.bat backup\luca\Author.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN
ETL PROCESSOR
BEFORE
->{id:1,name:asdd}
AFTER
->v(Author)[#11:0]
BEFORE
->{id:2,name:aasdasd}

AFTER
->v(Author)[#11:1]
END ETL PROCESSOR
+ extracted 3 records (29 records/sec) - 3 records -> loaded 2 vertices (19 vert
ices
/sec) Total time: 3103ms [0 warnings, 0 errors]



and the Post/Edge ETL:

{
 
"config": {
   
"log": "debug"

 
},
 
"extractor" : {
   
"jdbc": { "driver": "com.mysql.jdbc.Driver",
             
"url": "jdbc:mysql://localhost/test",
             
"userName": "root",
             
"userPassword": "",
             
"query": "select * from Post" }
 
},
 
"transformers" : [

   
{ log : { prefix: "BEFORE->" } },
   
{ "vertex": { "class": "Post"} },
   
{ log : { prefix: "MIDDLE->" } },

   
{ "edge": { "class": "Wrote", "direction" : "in",

           
"joinFieldName": "author_id",

           
"lookup":"Author.id", "unresolvedLinkAction":"WARNING"} },

   
{ log : { prefix: "AFTER->" } }
 
],
 
"loader" : {
   
"orientdb": {

     
"dbURL": "plocal:D:\Eclipse LaTeX\DB - orientdb\databases\Test6",

      dbUser
: "root",
      dbPassword
: "root",
      dbAutoDropIfExists
: false,
      dbAutoCreate
: true,
      tx
: false,
      wal
: false,
      batchCommit
: 1000,
      dbType
: "graph",

      indexes
: [{class:"Post", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
   
}
 
}
}



and it's output:

D:\Eclipse LaTeX\DB - orientdb\bin>oetl.bat backup\luca\Post.json
OrientDB etl v.1.7.8 (build @BUILD@) www.orientechnologies.com
BEGIN
ETL PROCESSOR
BEFORE
->{title:jop es klappt,author_id:1,text:wdwd}
MIDDLE
->v(Post)[#13:0]
AFTER
->v(Post)[#13:0]
BEFORE
->{title:aaaa,author_id:2,text:awfwaf}
MIDDLE
->v(Post)[#13:1]
AFTER
->v(Post)[#13:1]
END ETL PROCESSOR
+ extracted 3 records (0 records/sec) - 3 records -> loaded 2 vertices (0 vertic
es
/sec) Total time: 947ms [0 warnings, 0 errors]


Curtis Mosters

oläst,
28 aug. 2014 06:22:552014-08-28
till orient-...@googlegroups.com
Ahh I didn't use the latest snapshot, alright so I need to install ant. brb with an update
OrientDB etl v.1.7.8 (build @BUILD@) www<span style="color: #660;" class="styled
...

Curtis Mosters

oläst,
28 aug. 2014 08:03:282014-08-28
till orient-...@googlegroups.com
Ok so I have now 2.0 and getting this here:

C:\Users\Mr. Kox\Desktop\orientdb-develop\releases\orientdb-community-2.0-SNAPSH
OT
\bin>oetl.bat backup\luca\Author.json
OrientDB etl v.2.0-SNAPSHOT (build @BUILD@) www.orientechnologies.com
Exception in thread "main" com.orientechnologies.orient.core.exception.OConfigur
ationException
: Error on creating ETL processor
        at com
.orientechnologies.orient.etl.OETLProcessor.<init>(OETLProcessor.j
ava
:163)
        at com
.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.jav
a
:220)
Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Canno
t create database
        at com
.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.c
reate
(ODatabaseRecordAbstract.java:418)
        at com
.orientechnologies.orient.core.db.ODatabaseWrapperAbstract.create(
ODatabaseWrapperAbstract.java:61)
        at com
.orientechnologies.orient.core.db.ODatabaseRecordWrapperAbstract.c
reate
(ODatabaseRecordWrapperAbstract.java:70)
        at com
.orientechnologies.orient.core.db.ODatabaseWrapperAbstract.create(
ODatabaseWrapperAbstract.java:56)
        at com
.orientechnologies.orient.core.db.ODatabaseRecordWrapperAbstract.c
reate
(ODatabaseRecordWrapperAbstract.java:64)
        at com
.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getDatabase(
OrientGraphFactory.java:139)
        at com
.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getDatabase(
OrientGraphFactory.java:118)
        at com
.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getNoTx(Orie
ntGraphFactory
.java:102)
        at com
.tinkerpop.blueprints.impls.orient.OrientGraphFactory.get(OrientGr
aphFactory
.java:78)
        at com
.orientechnologies.orient.etl.loader.OOrientDBLoader.configure(OOr
ientDBLoader
.java:185)
        at com
.orientechnologies.orient.etl.OETLProcessor.configureComponent(OET
LProcessor.java:470)
        at com
.orientechnologies.orient.etl.OETLProcessor.<init>(OETLProcessor.j
ava
:136)
       
... 1 more
Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Canno
t create database
        at com
.orientechnologies.orient.core.db.raw.ODatabaseRaw.create(ODatabas
eRaw
.java:150)
        at com
.orientechnologies.orient.core.db.ODatabaseWrapperAbstract.create(
ODatabaseWrapperAbstract.java:61)
        at com
.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.c
reate
(ODatabaseRecordAbstract.java:369)
       
... 12 more
Caused by: java.lang.IllegalStateException: Location passed in WAL does not exis
t
, or IO error was happened. DB can not work in durable mode in such case.
        at com
.orientechnologies.orient.core.storage.impl.local.paginated.wal.OD
iskWriteAheadLog
.<init>(ODiskWriteAheadLog.java:609)
        at com
.orientechnologies.orient.core.storage.impl.local.paginated.wal.OD
iskWriteAheadLog
.<init>(ODiskWriteAheadLog.java:585)
        at com
.orientechnologies.orient.core.storage.impl.local.paginated.OLocal
PaginatedStorage.initWalAndDiskCache(OLocalPaginatedStorage.java:305)
        at com
.orientechnologies.orient.core.storage.impl.local.OAbstractPaginat
edStorage
.create(OAbstractPaginatedStorage.java:218)
        at com
.orientechnologies.orient.core.storage.impl.local.paginated.OLocal
PaginatedStorage.create(OLocalPaginatedStorage.java:97)
        at com
.orientechnologies.orient.core.db.raw.ODatabaseRaw.create(ODatabas
eRaw
.java:146)
       
... 14 more

I have the "orientdb-etl-0.9.2-SNAPSHOT.jar". What else can I do?

Luca Garulli

oläst,
28 aug. 2014 09:56:332014-08-28
till orient-database
Seems the database can't be created. Please check the db path, in case there is a previous database drop it (from FS)

Lvc@


Curtis Mosters

oläst,
28 aug. 2014 10:16:262014-08-28
till orient-...@googlegroups.com
No the folder is completely empty.

But somehow I can't even create a database in the Studio.

http://i.imgur.com/Yacjoj7.png

401 Unauthorized.

Luca Garulli

oläst,
28 aug. 2014 10:59:582014-08-28
till orient-database
Is that the root's password?

Curtis Mosters

oläst,
28 aug. 2014 11:50:422014-08-28
till
Totally forgot that it is a complete other config. For sure. Now it's working in the studio.

Mhh ok it was something with the database path in the ETL. Took my old one and it works.

This is what I got as output:

D:\Eclipse LaTeX\DB - orientdb\orientdb-community-2.0-SNAPSHOT\bin>oetl.bat back
up
\luca\Author.json
OrientDB etl v.2.0-SNAPSHOT (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
BEFORE
->{id:1,name:asdd}
AFTER
->v(Author)[#11:0]
BEFORE
->{id:2,name:aasdasd}
AFTER
->v(Author)[#11:1]
END ETL PROCESSOR
+ extracted 3 records (3 records/sec) - 3 records -> loaded 2 vertices (2 vertic
es
/sec) Total time: 1824ms [0 warnings, 0 errors]
D
:\Eclipse LaTeX\DB - orientdb\orientdb-community-2.0-SNAPSHOT\bin>oetl.bat back
up
\luca\Post.json
OrientDB etl v.2.0-SNAPSHOT (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
BEFORE
->{title:jop es klappt,author_id:1,text:wdwd}

MIDDLE
->v(Post)[#12:0]
AFTER
->v(Post)[#12:0]

BEFORE
->{title:aaaa,author_id:2,text:awfwaf}

MIDDLE
->v(Post)[#12:1]
AFTER
->v(Post)[#12:1]
END ETL PROCESSOR
+ extracted 3 records (4 records/sec) - 3 records -> loaded 2 vertices (3 vertic
es
/sec) Total time: 2615ms [0 warnings, 0 errors]

Luca Garulli

oläst,
28 aug. 2014 12:03:032014-08-28
till orient-database
Curtis,
Please update the ETL project, I've just pushed 2 fixes.

Please drop the database and restart from scratch.

Lvc@



On 28 August 2014 17:50, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
Totally forgot that it is a complete other config. For sure. Now it's working in the studio.

I again tried the ETL script but still having this here:

D:\Eclipse LaTeX\DB - orientdb\orientdb-community-2.0-SNAPSHOT\bin>oetl.bat back
up
\luca\Author.json
Am Donnerstag, 28. August 2014 16:59:58 UTC+2 schrieb Lvc@:

Curtis Mosters

oläst,
28 aug. 2014 12:44:522014-08-28
till orient-...@googlegroups.com
D:\Eclipse LaTeX\DB - orientdb\orientdb-community-2.0-SNAPSHOT\bin>oetl.bat back
up
\luca\Author.json
OrientDB etl v.2.0-SNAPSHOT (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Author'
[orientdb] DEBUG - OrientDBLoader: created property 'Author.id' of type: string

+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1000ms [0 warnings, 0 errors]
[orientdb] DEBUG - OrientDocumentLoader: created index 'Author.id' type 'UNIQUE_
HASH_INDEX'
against Class 'Author', fields [id:string]

BEFORE
->{id:1,name:asdd}
AFTER
->v(Author)[#11:0]
BEFORE
->{id:2,name:aasdasd}
AFTER
->v(Author)[#11:1]
END ETL PROCESSOR
+ extracted 5 records (8 records/sec) - 5 records -> loaded 2 vertices (3 vertic
es
/sec) Total time: 1565ms [0 warnings, 0 errors]

D
:\Eclipse LaTeX\DB - orientdb\orientdb-community-2.0-SNAPSHOT\bin>oetl.
bat back
up
\luca\Post.json
OrientDB etl v.2.0-SNAPSHOT (build @BUILD@) www.orientechnologies.com
BEGIN ETL PROCESSOR
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Post'
[orientdb] DEBUG - OrientDBLoader: created property 'Post.id' of type: string

+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 1000ms [0 warnings, 0 errors]
[orientdb] DEBUG - OrientDocumentLoader: created index 'Post.id' type 'UNIQUE_HA
SH_INDEX'
against Class 'Post', fields [id:string]
+ extracted 0 records (0 records/sec) - 0 records -> loaded 0 vertices (0 vertic
es
/sec) Total time: 2s [0 warnings, 0 errors]

BEFORE
->{title:jop es klappt,author_id:1,text:wdwd}

MIDDLE
->v(Post)[#12:0]
[1:edge] DEBUG joinValue=1, lookupResult=Author#11:0{id:1,name:asdd} v1
AFTER
->v(Post)[#12:0]

BEFORE
->{title:aaaa,author_id:2,text:awfwaf}

MIDDLE
->v(Post)[#12:1]
[3:edge] DEBUG joinValue=2, lookupResult=Author#11:1{id:2,name:aasdasd} v1
AFTER
->v(Post)[#12:1]
END ETL PROCESSOR
+ extracted 5 records (68 records/sec) - 5 records -> loaded 2 vertices (27 vert
ices
/sec) Total time: 2073ms [0 warnings, 0 errors]
...

Luca Garulli

oläst,
25 sep. 2014 20:32:192014-09-25
till orient-database
Hey Curtis,
I've just fixed a bug on edge creation. Please could you retry with last ETL version from github? (1.1-SNAPSHOT)

Lvc@


--

Curtis Mosters

oläst,
26 sep. 2014 03:07:222014-09-26
till orient-...@googlegroups.com
Hi Luca, it seem to work now.

But just tested with leightweight edges.

http://i.imgur.com/K18qTWm.png
...

Bojan Vukotić

oläst,
20 okt. 2014 04:51:032014-10-20
till orient-...@googlegroups.com
Hi guys!

The whole discussion here is how to create edges from one table to another, but what to do if we have more complex cases where we have connected 2 (or even more) tables? Example, n:n relation: book and authors, book can have one or more authors and author can work on one or more books.

Tables:

Book {
       book_id
       book_name,
        ....
}

Author {
      author_id,
      author_name,
      ......
}


Author_on_Book {
       ab_id,
       book_id,
       author_id,
       description // describes what this author did on this book
}

How to migrate this case? Should "Author_on_Book" be migrated as a vertex or edge? How to write scripts in this case? (in real life we could have even more foreign keys in "Author_on_Book" table )


Curtis Mosters

oläst,
20 okt. 2014 06:16:082014-10-20
till orient-...@googlegroups.com
Well you have several ways. Do you want to do it with JAVA oder the ETL plugin?

In any case I think it should be

Vertices: Author, Book
Edge: WROTE

WDYT?

Bojan Vukotić

oläst,
20 okt. 2014 06:36:572014-10-20
till orient-...@googlegroups.com

I prefer do it with ETL, if it is possible, I would like to avoid programming. If not, Java is also a good solution.

So, example how to do it in ETL? And regarding ETL, I was playing with it, it imports vertices nicely, but when I want to import edges (100 000 of them) it is extremely slow :( How to improve this?

Curtis Mosters

oläst,
20 okt. 2014 08:43:112014-10-20
till orient-...@googlegroups.com
Well I think it's way better to create a Java example. Then you understand what is happening in the background. Otherwhise in my tests the ETL way had the same speed, but these tests are 3-4 month old. I will redo them soon. Did you take the example of ETL from OrientDB? Otherwhise look above for some examples. Or even post yours here?

Bojan Vukotić

oläst,
20 okt. 2014 08:55:102014-10-20
till orient-...@googlegroups.com
I took example from here http://www.orientechnologies.com/docs/last/orientdb-etl.wiki/Import-from-DBMS.html

When can I find ETL/Java example?

Curtis Mosters

oläst,
20 okt. 2014 09:15:272014-10-20
till orient-...@googlegroups.com
I think I don't understand your issue:

First of all you import all Books. So you have all the data in there. Now setting an index on the Book.ID.

After that the Authors are imported. They are matched with the ID of the Book. I think in your case the Author matched with a Book is the same like the Book.ID? Or do you have another file containing the relations of the ID's?

Can you post here the first 10 lines of each file maybe? That would help a low.

Bojan Vukotić

oläst,
20 okt. 2014 09:26:272014-10-20
till orient-...@googlegroups.com

Well, that's exactly what I need (and what I already did). I thought maybe that ETL has some API that we could use to make this easier (I used 'pure' OrientDB API to implement this)

Curtis Mosters

oläst,
20 okt. 2014 14:31:382014-10-20
till orient-...@googlegroups.com
Is you code secret? I still don't understand the real issue you have, sorry.

Luca Garulli

oläst,
20 okt. 2014 14:51:222014-10-20
till orient-database
@Bojan, How did you import edges? Can you share the code? It's hard to help without any information.

Lvc@


Bojan Vukotić

oläst,
21 okt. 2014 02:55:352014-10-21
till orient-...@googlegroups.com
This is my code (I didn't use actually Book and Author tables, I used my domain specific tables, but here I illustrated problem with these common names to make it easier to understand).
This works, but my question is could this be done using just ETL without any coding.


        Map<Integer, Vertex> bookMap = new HashMap<>();
        String queryString = "SELECT * FROM Books";
        Statement statement = connection.createStatement();
        ResultSet rs = statement.executeQuery(queryString);
        while (rs.next()) {
            Vertex bookVertex = graph.addVertex("class:Book");
            Integer bookId  = rs.getInt(1);
            bookVertex.setProperty(...);
            // fill vertex from result set
            bookMap.put(bookId, bookVertex);
        }
        rs.close();
        statement.close();
       
        Map<Integer, Vertex> authorMap = new HashMap<>();
        queryString = "SELECT * FROM Authors";
        statement = connection.createStatement();
        rs = statement.executeQuery(queryString);
        while (rs.next()) {
            Vertex authorVertex = graph.addVertex("class:Author");
            Integer authorId  = rs.getInt(1);
            authorVertex.setProperty(...);
            // fill vertex from result set
            authorMap.put(authorId, authorVertex);
        }
        rs.close();
        statement.close();
       
       
        graph.commit();
       
        queryString = "SELECT * FROM BookAuthor";
        statement = connection.createStatement();
        rs = statement.executeQuery(queryString);
       
       
        while (rs.next()) {
           
            Integer bookId = rs.getInt(2);
            Integer authorId = rs.getInt(2);
           
            Edge edge = graph.addEdge(null, bookMap.get(bookId), authorMap.get(authorId), "BookAuthor");
            edge.setProperty(....);
        }
        rs.close();
        statement.close();
       
        graph.commit();

Bojan Vukotić

oläst,
21 okt. 2014 03:07:302014-10-21
till orient-...@googlegroups.com

I tried to use ETL as described here  http://www.orientechnologies.com/docs/last/orientdb-etl.wiki/Import-from-DBMS.html
I used slightly modified approach (I don't know is it a correct one). I tried to import BookAuthor as vertex first (went fast and with no problems),  than I tried to import Book as a vertex and to create out relation from Book to BookAuthor. This lasted for ages (I have about 50 000 rows in Books table and 150 000 in BookAuthor)

{
  "config": {
    "log": "error"
  },
  "extractor" : {
    "jdbc": { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
              "url": "jdbc:sqlserver://localhost:1433;DatabaseName=SomeDB",
              "userName": "sa",
              "fetchSize": 500,
              "userPassword": "sa",
              "query": "select * from BookAuthor" }
  },
  "transformers" : [
    { "vertex": { "class": "BookAuthor"} }
  ],
  "loader" : {
    "orientdb": {
      "dbURL": "plocal:d:/orientdb/myorient",
      "dbAutoCreate": true,
      "dbType": "graph",
      "batchCommit": 1000
    }
  }
}

{
  "config": {
    "log": "error"
  },
  "extractor" : {
    "jdbc": { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
              "url": "jdbc:sqlserver://localhost:1433;DatabaseName=SomeDB",
              "userName": "sa",
              "fetchSize": 500,
              "userPassword": "sa",
              "query": "select * from Book" }
  },
  "transformers" : [
   { "vertex": { "class": "Book"} },
   { "edge": { "class": "Book_BookAuthor", "direction" : "out",
            "joinFieldName": "bookId",
            "lookup":"BookAuthor.bookId", "unresolvedLinkAction":"CREATE"} }  
  
  ],
  "loader" : {
    "orientdb": {
       dbURL": "plocal:d:/orientdb/myorient",
      "dbAutoCreate": true,
      "dbType": "graph",
      "batchCommit": 1000

    }
  }
}





On Monday, 20 October 2014 20:51:22 UTC+2, Lvc@ wrote:

BojanV

oläst,
24 okt. 2014 08:36:252014-10-24
till orient-...@googlegroups.com
No comments on this?

Is there any configuration tweaks to be done to speed up things?

Enrico Risa

oläst,
24 okt. 2014 09:34:512014-10-24
till orient-...@googlegroups.com

How many edges per seconds do you write?

BojanV

oläst,
24 okt. 2014 09:43:522014-10-24
till orient-...@googlegroups.com
This answer is when I am working import from Javacode:

It differs - if my vertices are all in default class then around 50 edges per second, If my vertices are in different classes then inserting is slower - around 30. If I use OrientGraphNoTx instead of OrientGraph things are little bit faster but still not good enough. I posted one version of my code a few posts above. (Note: my machine is pretty good, and I didn't do any additional configuring of OriendDB - I just downloaded it and used it  )

Enrico Risa

oläst,
24 okt. 2014 09:49:452014-10-24
till orient-...@googlegroups.com
Do you use the Massive Insert intent ?

--

BojanV

oläst,
24 okt. 2014 09:51:252014-10-24
till orient-...@googlegroups.com, enric...@gmail.com
Yes, I declare it in this way, I hope it is correct

        OrientGraphNoTx graph = new OrientGraphNoTx("plocal:D:/orientdb/atlas");
        graph.getRawGraph().declareIntent(new OIntentMassiveInsert());

Enrico Risa

oläst,
24 okt. 2014 10:12:082014-10-24
till orient-...@googlegroups.com
Yes it is correct

i have a similar importer. 50 edges per seconds are too low
seems strange

Can you try to use  the OrientGraph and commit every 1000 entries?

you can use graph.begin()

and graph.commit()

to handle the transaction


BojanV

oläst,
24 okt. 2014 11:18:292014-10-24
till orient-...@googlegroups.com, enric...@gmail.com
If I use OrientGraphNoTx (as recommended) does begin/commit have sense at all?

Luca Garulli

oläst,
24 okt. 2014 11:20:162014-10-24
till orient-database, Enrico Risa
Hi,
Try Tx and commit every 1k: 

      tx: true,
      batchCommit: 1000,


Lvc@

BojanV

oläst,
28 okt. 2014 06:22:082014-10-28
till orient-...@googlegroups.com, enric...@gmail.com
Regarding this slow import when doing it from code - I found how to improve it (maybe ETL thread is not the right place to describe it)

My looked like:

OrientGraphNoTx graph = new OrientGraphNoTx("plocal:D:/orientdb/mydb");
// do importing of vertices - this part was ok and fast engough
// do importing of edges - this part was terrible slow.

graph.commit();
graph.shutdown();

I tried many variants - to keep vertices in map instead of reading them from db when importing edges, but none of that was good enough.
Trick is this:

OrientGraphNoTx graph = new OrientGraphNoTx("plocal:D:/orientdb/mydb");
// do importing of vertices -
graph.commit(); // I think this is optional if I am using OrientGraphNoTx
graph.shutdown();

graph = new OrientGraphNoTx("plocal:D:/orientdb/mydb");
// do importing of edges
graph.commit(); // I think this is optional if I am using OrientGraphNoTx
graph.shutdown();

The trick is to close graph after importing of vertices and open it again when importing edges. For some reason this is much, much faster before!

Gregory Zhukovsky

oläst,
28 feb. 2016 13:29:342016-02-28
till orient-...@googlegroups.com
Hello Lvc@,

Can you please be more specific on that approach?
I'm trying to do exactly (at least I think so, I'm a newbie:)) this. 

I want OrientDB to find the routes between airports, basing on the flight information - even if there's no direct flight (to find the route with connections).

So, in my RDBMS (SQL Server) I have two related tables - Destinations and Flights, while each flight contain two Destination IDs - From and To.
In the graph data model, I want it to be like vertices-Destinations that are connected with edges-flights.
I've managed to import the destinations; but I can't find the way to construct the second ETL, that will create just the edges from my Flights table; while googling it, I met some people with the same question (ex. http://orient-database.narkive.com/d8c4b82y/orientdb-etl-edge-creation-help or https://stackoverflow.com/questions/32778905/import-edges-to-orientdb-using-etl), but they aren't seem to be resolved.

I use the Community Edition v.2.1.11,
Any help will be appreciated a lot,
Gregory
 

On Monday, August 25, 2014 at 2:23:55 PM UTC+3, Lvc@ wrote:
Curtis,
You could create 2 or more etl scripts, some that extract vertices and other that extract edges.

Lvc@



On 25 August 2014 12:25, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
Hi Luca, well your example is just reading one table from MySQL. One table with creating edges on it's entries. But in the usual cases that's not the way how to import from a MySQL. Or maybe I just didn't understood that example completely. But so far as I have seen it, you used one table from MySQL that contained {id,...,friend_id}. You created profiles on that id and then created edge on that id's. Is that right?

But just once again I have those data as usually stored in another table. So that cannot work for me.

Am Donnerstag, 21. August 2014 15:56:05 UTC+2 schrieb Lvc@:
On 21 August 2014 14:45, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
This is a great example Luca, thanks first of all.

But could you additionally add information about the MySQL schema and do I understand it correctly that you get the classes Person,Friend and an edge. But where comes the friend data? 


Friend in my case is an Edge class. (create class Friend extends E). It's useful to create ad-hoc edge classes instead that the generic "E":


Lvc@

 

Am Donnerstag, 21. August 2014 13:25:47 UTC+2 schrieb Lvc@:
Hi Curtis,
ETL is very powerful and we're improving it everyday with users' feedback. We want to let it to be final for 2.0, so we can bundle with it. By the way Enterprise Edition will have a Web Interface to edit it, and in the future also to debug it.

Unfortunately I don't understand in your example table1 & 2 with both name_id field. Look at this example about Friendship. I want to connect a social network like app. 

This is an example of ETL configuration. My comments begins with // but remove them from real config file:

{
  config: {
    verbose: true
  },
  begin: [
  ],
  extractor: {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
    "url": "jdbc:mysql://localhost/mysocialnetwork",
    "userName": "root",
    "userPassword": "root",
    "query": "select * from profile"
   }
  },
  transformers : [
    {
      // THIS IS ONLY A LOG, USEFUL TO TRACE WHAT ARRIVES FROM MYSQL
      log: {
        prefix: "MySQL -> "
      }
    },
    {
      merge: {
        // LOOKUP BY ID: IF ALREADY PRESENT MERGE THE FIELDS BY OVERWRITING THE DIFFERENT ONES
        joinFieldName: "id",
        lookup: "Profile.id"
      }
    },
    {
      vertex: {
        // TRANSFORM IT IN A VERTEX
        class: "Profile"
      }
    },
    {
      edge: {
        // CONNECT THE FRIEND IN MYSQL COLUMN 'friend_id'
        class: "Friend", // Friend is the edges' class
        joinFieldName: "friend_id", // 
        lookup: "Profile.id",
        unresolvedLinkAction: "CREATE", // IF DOESN'T EXIST YET, CREATE THE VERTEX EMPTY. WILL BE FURTHER UPDATED
        if: "friend_id is not null"
      }
    },
    {
      log: {
        prefix: "Vertex -> "
      }
    }
  ],
  loader : {
    orientdb: {
      dbURL: "remote:localhost/yourdb",
      dbUser: "admin",
      dbPassword: "admin",
      dbAutoCreate: true,
      tx: true,
      batchCommit: 1000,
      dbType: "graph",
      classes: [
      ],
      indexes: [
      ]
    }
  }
}


Lvc@



On 21 August 2014 13:05, 'Curtis Mosters' via OrientDB <orient-...@googlegroups.com> wrote:
Hey so I have now some experiences with ETL. I think this is great for import. That's why I want to improve the importing process now.
  1. First of all is it possible to run several imports parallel(not with several consoles) or if one is done start another instantly?
  2. How do I create an edge (https://github.com/orientechnologies/orientdb-etl/wiki/Transformer#edge is not helpful for me)

Let's say I have 2 tables in MySQL:

table1 with {name_id,name} and table2 with {name_id,text_id,text}

Now I want to get them connected (name_id) in OrientDB.

One way would now be to import both tables as Class. That's easy and then create edges with the name_id. Everything is fine with that, but it would be much easier to generate edges with ETL if that is anyway possible. So I don't know what is more efficient. Someone else also said that Sails (https://github.com/vjsrinath/sails-orientdb) might be a good choice.

If ETL allows that I could imagine about the following way:

  • read line of table1 -> safe in memory
  • look in table2 if name_id exists
    • -> if yes, create vertex Tabel1 and Table2
    • -> if no, create vertex Tabel1 and Table2 and create an edge
  • and so on

Of course this is very crazy if you imagine 30 tables or so. Was just an idea, just to check if that is already possible somehow.

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Svara alla
Svara författaren
Vidarebefordra
0 nya meddelanden