We're evaluating Titan to see if it is going to satisfy our needs. For that we're importing a large graph into Titan backed with Google BigTable. We're using Titan 1.0.0. The graph contains 800M+ vertices and 25B+ edges. In terms of schema, I think the only relevant part is that the vertices are keyed with a non-long value in the original store so we had to define an index on the key property to find the vertices efficiently. Vertex and edge properties (schema-wise) are pretty minimal, 3-5 properties with small values on both.
We were able to import the vertices in a relatively short amount of time (less than a day). However, when we started importing edges, we see a much slower pace (ie 100 times slower). Some slowdown is expected because for edges we need to find the vertices first, but such slowdown is a deal breaker.
How can we investigate where the slowdown is occurring?
We've also observed that (per big table metrics in google console) during vertex import write requests/s is 800/s while read requests are 22K/s. During edge import however, write requests/s is 300/s while read requests are 85K/s. It is not clear why the read requests are so high compared to the write requests, especially during edge import the ratio becomes almost 300. Any idea why this might be happening?
Thanks
PS: We have tried using BulkLoaderVertexProgram but we couldn't get it to work due to dependency issues between Titan-Tinkerpop-GoogleBigTable libraries. If anyone has successfully achieved this, please let me know.