Bulk data load Performance Benchmark and Tuning

129 views
Skip to first unread message

Shivanandan Gupta

unread,
Feb 19, 2015, 1:19:20 AM2/19/15
to orient-...@googlegroups.com
Hi All,

I tried loading data from csb file to in-memory graphdatabase class and the statistics I am getting is given below. I used json ETL file to load data from csv. If I anyone has a better way to load the same amount of data quickly them please suggest the way.

We are planning of implementing a near to realtime DWH on orientdb.


RecordsLoaded

AttributesPerRecords

DataVolume(csv file)

TimeTaken (MS)

TimeTaken(Minutes)

1,000,320

75

250 MB

261365

4.36




{
  "source": { "file": { "path": "F:\Work\MDATA-ETL\s_asset.csv" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {} },
    { "vertex": { "class": "S_ASSET" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/databases/indb",
"dbUser": "root",
"dbPassword": "root",
"dbAutoCreate": true,
"tx": false,
"batchCommit": 10000,
       "dbType": "graph"
    }
  }
}


Thanks in advance.

Regards,
Shivanandan Gupta

Luigi Dell'Aquila

unread,
Feb 19, 2015, 2:56:58 AM2/19/15
to orient-...@googlegroups.com
Hi Shivanandan,

the easy way to go faster is using plocal instead of remote, but I don't know if in your case it's possible (other instances OrientDB have to be shut down for the time ETL works...)

Luigi


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shivanandan Gupta

unread,
Feb 19, 2015, 4:02:16 AM2/19/15
to orient-...@googlegroups.com
Hi Luigi,

Thanks for your response ,I am loading the csv file from the same server where orientdb is installed,  I tried doing a plocal (create a DB as plocal ? )  can you please help me how to go with it? I am a newbei to orientdb.

We are trying to have a data model as DWH like facts and dimensions in orientdb.

Thanks,
Shivanandan Gupta

Luigi Dell'Aquila

unread,
Feb 19, 2015, 4:10:04 AM2/19/15
to orient-...@googlegroups.com
Hi Shivanandan,

you can just replace this

 "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/databases/indb",
"dbUser": "root",
"dbPassword": "root",
"dbAutoCreate": true,
"tx": false,
"batchCommit": 10000,
       "dbType": "graph"
    }
  }

with this

 "loader": {
    "orientdb": {
       "dbURL": "plocal:/your/absolute/path/to/OrientDB/databases/indb",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": true,
"tx": false,
"batchCommit": 10000,
       "dbType": "graph"
    }
  }

And launch the ETL again. If the database does not exist, the ETL will create it for you.
Just ensure that there are no other instances of OrientDB running on that database while ETL is running, otherwise you will have an IOException

Regards

Luigi

Shivanandan Gupta

unread,
Feb 19, 2015, 4:20:02 AM2/19/15
to orient-...@googlegroups.com
Thanks Luigi I changed it and it worked. the statics is given below:

RecordsLoaded

AttributesPerRecords

DataVolume

TimeTaken (MS)

TimeTaken(Seconds)

1,000,320

75

250 MB

 50435ms

50 Sec


Thanks
Shivanandan Gupta

Shivanandan Gupta

unread,
Feb 19, 2015, 4:45:19 AM2/19/15
to orient-...@googlegroups.com
Hi Luigi,

I used below given json for ETL the data load happened but I am not able to see any vertex in the class.


{
  "source": { "file": { "path": "C:\odb\s_asset.csv" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {} },
    { "vertex": { "class": "S_ASSET" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "memory:/temp/databases/eigenin",
"dbUser": "root",
"dbPassword": "root",
"dbAutoCreate": true,
"tx": false,
"batchCommit": 10000,
       "dbType": "graph"
    }
  }
}


Thanks,
Shivanandan Gupta

Luigi Dell'Aquila

unread,
Feb 19, 2015, 4:48:37 AM2/19/15
to orient-...@googlegroups.com
Hi Shivanandan,

the problem here is that ETL is creating an in-memory database, and every in-memory db is completely deleted when the VM goes down (in this case when the ETL terminates). If you want a persistent db you have to use PLocal

Regards

Luigi


Shivanandan Gupta

unread,
Feb 19, 2015, 5:14:04 AM2/19/15
to orient-...@googlegroups.com
In current situation , we want to load data in in-memory DB and another application will read data from it. Is there a way I can persist the data inmemory till another application reads it?

I used plocal and data can be persisted but the load time is high 2 minutes for the same set of records.


RecordsLoaded

AttributesPerRecords

DataVolume

TimeTaken (MS)

TimeTaken(Minutes)

1,000,320

75

250 MB

 125343ms

2

Reply all
Reply to author
Forward
0 new messages