Re: [orientdb] Merge of two databases

297 views
Skip to first unread message

Luca Garulli

unread,
Jul 25, 2012, 9:48:01 AM7/25/12
to orient-...@googlegroups.com
Hi Christian,
in OrientDB all is based on RID! So you can't have different cluster id assigned to the cluster in different databases.

You should edit the JSON file and search-replace the RIDs. For example if you have in db1 1,000 customers, from RID #10:0 to #10:999, and for db2 the cluster "customer" is not the 10th but 20th, then search and replace #10: with #20:

In the future the importer could be smarter handling these cases.

Lvc@

On 25 July 2012 10:02, Christian Hachenberg <hache...@uni-koblenz.de> wrote:
Hi,

I am trying really hard to get this working but couldn't make it so far: I have two large databases which I created because of limited space on one harddisk. Actually, the data is supposed to be in ONE database. Now, when I got more harddisk space I tried to merge these former two databases into one larger database. But this gives me always an exception like this when importing database #2 after having imported database #1 to a brand new (i.e. empty) database (this import of database #1 works fine, though):

Error on database import happened just before line 18, column 52
com.orientechnologies.orient.core.exception.OConfigurationException: Imported cluster 'dmoz270' has id=267 different from the original: 7
at com.orientechnologies.orient.core.db.tool.ODatabaseImport.importClusters(ODatabaseImport.java:500)
at com.orientechnologies.orient.core.db.tool.ODatabaseImport.importDatabase(ODatabaseImport.java:121)
at com.orientechnologies.orient.console.OConsoleDatabaseApp.importDatabase(OConsoleDatabaseApp.java:1419)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.orientechnologies.common.console.OConsoleApplication.execute(OConsoleApplication.java:238)
at com.orientechnologies.common.console.OConsoleApplication.executeCommands(OConsoleApplication.java:127)
at com.orientechnologies.common.console.OConsoleApplication.run(OConsoleApplication.java:92)
at com.orientechnologies.orient.graph.console.OGremlinConsole.main(OGremlinConsole.java:51)

The database consists of hundreds of clusters and they have a successive enumeration -- of course, the mentioned cluster dmoz270 from database #2 has another ID in its original database... but why does this not work with importing stuff??? Gives me really headaches and I urgently need to merge these two... :-(

Thx in advance for any help!
Best, Christian

--
 
 
 

Christian Hachenberg

unread,
Jul 26, 2012, 9:31:50 AM7/26/12
to orient-...@googlegroups.com
Hi Lvc@,

thanks for the quick reply. To make my purpose clear enough: It is not about having the SAME cluster twice, once with (in the example code above) id=267 and once with id=7 in different database. A cluster is just occuring ONCE in each of the databases! The id hassle just happens during the import, when OrientDB ASSIGNS id=267 automatically to the "new" cluster with former id=7!

I REALLY think an importer should take care of this and not insist on the same RID when importing -- this probably never matches when merging two or more databases. And this is expected from a "import database <file>" command to deliver exactly this!

Best, Christian

Luca Garulli

unread,
Jul 26, 2012, 10:02:28 AM7/26/12
to orient-...@googlegroups.com
Hi,
sorry but this kind of merge is not supported yet. Could you open an issue for this?

Lvc@

--
 
 
 

Christian Hachenberg

unread,
Jul 26, 2012, 10:17:44 AM7/26/12
to orient-...@googlegroups.com
Of course, just did it :-) (was issue 981)

Best, Christian
--
Christian Hachenberg

Institute for Web Science and Technologies (WeST)
Universit�t Koblenz-Landau
Universit�tsstra�e 1
56070 Koblenz
Germany

Tel.: ++49 261 287-2759
Fax : ++49 261 287-100-2759
E-Mail: hache...@uni-koblenz.de
Web: http://west.uni-koblenz.de

Gabriel Vince

unread,
Jul 27, 2012, 5:16:38 AM7/27/12
to orient-...@googlegroups.com
Hi all,

just a consideration - this type of merges (complete database merge) are rarely supported in other databases (RDBMS or NoSQL), it is a traditional ETL discipline (how to migrate data, how to keep relational consistency, etc).

For simler cases - I'd advice to improve JDBC driver (to return @rid) and use any ETL tool (Talend, Jasper, ..). if this feature would be supported out of the box, it would be a feature highly over market average.

It could be difficult by design - you cannot keep @RID values, you will have to map old/new values for import and if an external system keeps references to data, we are screwed.

suggestion:
 - from the first mail I'd guess the use case would be solved by sharding (on the server or cluster side) or setting where a clusters should be placed, and I'd support the harding feature too.

Carpe diem
       Gabriel




Gabriel Vince

unread,
Jul 27, 2012, 5:25:55 AM7/27/12
to orient-...@googlegroups.com
Christian,

just as a workaround - the data can be exported as JSON, if you are good with ETL (almost every good ETL tool supports JSON today), you could try to migrate exported json data, but not as they are, but you have to create inserts to create new @RIDs) if you have related data, than you have to store old/new rids for relational (graph) mapping.:(

gabriel


Reply all
Reply to author
Forward
0 new messages