TDB Performance

25 views
Skip to first unread message

Tim Smith

unread,
Jul 27, 2020, 7:25:50 PM7/27/20
to topbrai...@googlegroups.com
Hi,

I recently needed to convert 450+ XML files to triples.  Based on my previous experience trying to import a large XML file that took 4 hours to complete, I used SPARQLMotion to convert all 450 files in less than 5 minutes.  A HUGE improvement!  (I tried the import method and it ran 10 hours and only converted ~200 files.  Most of the files are small).

As output, I wrote a ttl file to disk (~1GB).  I then decided I wanted the triples in TDB so I opened the TTL file and ran an insert to push the approximately 5 million triples into TDB.  This ran for over an hour with no sign of completing. 

Next I tried to use the SM module ExportToTDB.  This ran quickly but resulted in no triples in TDB.

Finally, I tried the SM module PerformUpdate with which I have had varying levels of success in the past.  It appears to be following the same course as the manual insert query in that the CPU utilization indicates something is happening but I see no change in the TDB files on disk.

What should I expect for TDB performance?  I did not think of 5M triples as being large and, based on published performance figures, I was expecting <30 minutes for insertion.  I'm using TBC 6.4 Beta on Windows 10, quad-core I7, SSD, 32GB RAM of which TBC is only using 4GB of an available Xmx=28GB.

Thanks for your input,

Tim

Irene Polikoff

unread,
Jul 27, 2020, 8:46:18 PM7/27/20
to topbrai...@googlegroups.com
Tim,

Not sure why ExportToTDB resulted in no triples for you.

Since you are using TBC-ME, try the following:

1.  Open the .tdb or .xdb of the target data graph with TBC. 
2. Then right click on the TTL file and select Import > Import Triples from this File into current Model

This uses a streaming importer that by passes validation and a few other things.

On my machine for 6.5M triples this took less than 15 minutes.

In 6.4, this streaming import will also be available in EDG.

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/CAF0WbnLZCazjPiwUvkUmRHKnSQKo7_aO%2B%2BALZfxxn7moFbcdXQ%40mail.gmail.com.

Tim Smith

unread,
Jul 28, 2020, 12:03:36 AM7/28/20
to topbrai...@googlegroups.com
Hi Irene,

Thanks for the suggestion.  The SM Perform Update has been running for nearly 6 hours and counting...

HOWEVER... I tried your suggestion and two things happened:

#1 - I realized I dramatically underestimated the number of triples - I have 10.9M.
#2 - I was able to load them into TDB using Import -> Import Triples from this File into the Current Model in less than THREE MINUTES.

WOW.

What a difference.  I will remember this performance improvement!

Thank you!  What a Time saver.  I thought I would be stuck until at least tomorrow.

Tim

image.png

Reply all
Reply to author
Forward
0 new messages