OGraphBatchInsert (orientdb-community-2.0-SNAPSHOT)

Emin Agassi

unread,

Jun 19, 2015, 9:27:48 AM6/19/15

to orient-...@googlegroups.com

Hello Luigi or Luca,

I have a question regarding the new Batch Insert class.
I have a database that contains 744,496 rows which are loaded as Vertices and 6,445,621 rows which are loaded as Edges.
I am using the new OGraphBatchInsert.
I am following the procedure described in the Java comments: create edges first and then set Vertix properties.
The createEdge() API requires two Long ids for Vertices.
Today, I have an incremental counter that I use to generate these Ids.
Then, I use the same ids in the set Vertix properties operation.

Questions:
Does it matter how large these generated Ids get?
Should these ids be used in the same sequential order after creating them for the createEdge and then in the same order for the setVertix properties.

Creating Edges is very fast for this size of the graph but setting Vertix properties is slow. Would you know why? Could this be related to the sorting order of the Vertex Ids between createEdge calls and set Vertix properties?

I am not sure why set Vertix properties is so slower. I am not setting large properties. I only have 4 properties to set.

Thank you for help
Emin

Luigi Dell'Aquila

unread,

Jun 19, 2015, 10:25:18 AM6/19/15

to orient-...@googlegroups.com

Hi Emin,

the batch insert is very fast because it does a lot of operations in RAM and then flushes raw data to disk all together.

The maximum value of vertex IDs counts a lot, because OrientDB will create (and in some cases destroy, during import process) as many records as that number, so if you can keep it low you will have better performance.

The same is for the sorting order of vertex ids in set vertex properties, I strongly suggest you to do fully sorted setVertexProperties().

Anyway, when you invoke setVertexProperties() records are actually flushed to the clusters (before that everything happens in RAM), this is why at that moment you have a slowdown.

Out of curiosity, which insert rate are you having?

Thanks

Luigi

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Emin Agassi

unread,

Jun 19, 2015, 11:20:59 AM6/19/15

to orient-...@googlegroups.com

Hi Luigi,

What do you mean by insert rate ? How long it takes to load this size of the graph?
Currently, it takes 4 minutes to load 744,496 rows as Vertices and 6,445,621 rows as Edges.
I am not sorting Vertix ids and not keeping the same order that I used for creating Edges.
I would like to confirm that you are suggesting to use Vertix ids in the same order as used when creating Edges. Correct?
Or, are you suggesting just sort the Vertix ids after I create them for the createEdges and use the sorted ids for setVertex properties?

Also, my problem is that these newly generated Ids are not the ids used in the database. This means that I need to map between new generated ids and the object ids stored in the DB.
So, I ended up having a HashMap between these new ids and ids coming from the DB. Otherwise, I do not know which object I am setting vertix properties for.

Thanks
Emin

Luigi Dell'Aquila

unread,

Jun 19, 2015, 12:24:50 PM6/19/15

to orient-...@googlegroups.com

Hi Emin,

I'm suggesting to use sorted vertex IDs when you insert the properties.

Thanks

Luigi

Emin Agassi

unread,

Jun 19, 2015, 1:06:35 PM6/19/15

to orient-...@googlegroups.com

Thank you.
I also had to modify the OGraphBatchInsert class to support storing BLOBs in bulk. I did this code in createVertex and setVertexProperties:

if (properties != null && properties.containsKey("BLOB") ) {
          String xmlBlob = (String) properties.get("BLOB");
          ORecordBytes record = new ORecordBytes(xmlBlob.getBytes());
          record.save();
          doc.field("BLOB", record);
          properties.remove("BLOB");
        }

This makes the bulk slower but not extremely slow.
Does this look ok to you is there a better method?

Grazie!
Emin

Luigi Dell'Aquila

unread,

Jun 22, 2015, 9:11:11 AM6/22/15

to orient-...@googlegroups.com

HI Emin,

I think it's ok for single use case. Anyway for the BatchInsert I'll implement something more general, could you please open a new issue about this?