Importing large data

0 views
Skip to first unread message

pulseof...@gmail.com

unread,
Nov 11, 2014, 7:55:58 AM11/11/14
to sta...@clarkparsia.com
Hello everyone,
I am currently trying to import freebase into stardog.

So i downloaded BaseKB and wiped the triples.

Now i have 200 parts à 800MB.

Whats the best way to import that data?

I tried

stardog data add *

But this seems to go endless...

Also trying to add 10 files (8GB) will go endless.


Whats the best way to import so much data?

Rob Vesse

unread,
Nov 11, 2014, 8:12:11 AM11/11/14
to Stardog Mailing List
Firstly are the files to be loaded on the same machine as Stardog or a different one?

If they are on the same machine then add the --server-side flag (http://docs.stardog.com/man/data-add.html) as otherwise the data add command transmits the files to the server via the relevant network protocol which adds lots of unnecessary networking overhead if the files are on the same machine as the server.

Generally the advice I have always seen given in the past is that large bulk loads are best done at database creation time with the administrative db create command (http://docs.stardog.com/man/db-create.html) rather than via the data add command e.g.

./stardog-admin db create -t D -n example *

In this case Stardog will assume that the files live on the same server as the client.

As I understand it this performs betters because when creating a new database Stardog can manipulate the index files freely and build them directly and not have to worry about transactions, differential indexes etc.

The --index-triples-only option passed to this command may also improve performance if the data to be loaded contains only triples and not quads (as otherwise queries involving named graphs will have poor performance)

Hope this helps,

Rob

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

To unsubscribe from this group and stop receiving emails from it, send an email to stardog+u...@clarkparsia.com.

pulseof...@gmail.com

unread,
Nov 12, 2014, 11:21:34 AM11/12/14
to sta...@clarkparsia.com
Hello Rob,
Thank you for your answer.

I tried the import with this settings but it also did not work :-(

michael...@gmail.com

unread,
Nov 12, 2014, 3:52:24 PM11/12/14
to sta...@clarkparsia.com
How many RAM should I have for the whole Basekb? I currently have 48gb.. Maybe this is not enough.

Mike Grove

unread,
Nov 17, 2014, 7:44:49 AM11/17/14
to stardog
Information about capacity planning is in the docs [1].

Cheers,

Mike


On Wed, Nov 12, 2014 at 3:52 PM, <michael...@gmail.com> wrote:
How many RAM should I have for the whole Basekb? I currently have 48gb.. Maybe this is not enough.

michael...@gmail.com

unread,
Dec 4, 2014, 11:31:30 AM12/4/14
to sta...@clarkparsia.com, pulseof...@gmail.com
The problem is still the same.
When I choose too much files at once to import, the import itself doesn't even start.

The CPU usage is at about 70-80%, usually the import consumes the full 800%.


Is this a bug?

Mike Grove

unread,
Dec 4, 2014, 11:40:25 AM12/4/14
to stardog, pulseof...@gmail.com
On Thu, Dec 4, 2014 at 11:31 AM, <michael...@gmail.com> wrote:
The problem is still the same.
When I choose too much files at once to import, the import itself doesn't even start.

The CPU usage is at about 70-80%, usually the import consumes the full 800%.

Is this a bug?

As Rob suggested, bulk loading is best performed at database creation time to get the optimal write performance.  Stardog 3.0 will have write transaction performance equivalent to bulk loads.

Cheers,

Mike
 

Am Dienstag, 11. November 2014 13:55:58 UTC+1 schrieb pulseof...@gmail.com:
Hello everyone,
I am currently trying to import freebase into stardog.

So i downloaded BaseKB and wiped the triples.

Now i have 200 parts à 800MB.

Whats the best way to import that data?

I tried

stardog data add *

But this seems to go endless...

Also trying to add 10 files (8GB) will go endless.


Whats the best way to import so much data?

michael...@gmail.com

unread,
Dec 4, 2014, 11:43:00 AM12/4/14
to sta...@clarkparsia.com, pulseof...@gmail.com
Hi Michael,
I also tried this method and there was the same error.

Could it be, that stardog cannot handle a large amount of files? It seems to me that it doesn't even start to import.

Mike Grove

unread,
Dec 4, 2014, 11:44:21 AM12/4/14
to stardog, pulseof...@gmail.com
On Thu, Dec 4, 2014 at 11:43 AM, <michael...@gmail.com> wrote:
Hi Michael,
I also tried this method and there was the same error.

Could it be, that stardog cannot handle a large amount of files? It seems to me that it doesn't even start to import.

No.  We regularly bulk load hundreds of files when loading very large datasets.

Cheers,

Mike

Evren Sirin

unread,
Dec 4, 2014, 11:46:48 AM12/4/14
to Stardog, pulseof...@gmail.com
Note that loading progress is printed in the server log file and not
on the client side. If you do something like `tail -f stardog.log` you
can see the progress monitor on the terminal. Do you not see any
output in stardog.log? What is the exact command you are using to
create the database?

Best,
Evren

michael...@gmail.com

unread,
Dec 4, 2014, 12:10:35 PM12/4/14
to sta...@clarkparsia.com, pulseof...@gmail.com
I tried again now. But I got another problem.
With this command:
/opt/stardog-2.2.2/bin/stardog-admin db create -n basekbnew /var/rdf/dump_09_21/literals/*.clean

File: /var/rdf/dump_09_21/a/a-m-00045.nt.clean Message: Expected '<', found: a [line 1]

<http://rdf.mindbabble.com/ns/american_football.football_conference> a <http://www.w3.org/2000/01/rdf-
schema
#Class>.

And with other files:

File: /var/rdf/dump_09_21/literals/literals-m-00009.nt.clean Message: Expected '.', found: ; [line 396]



<http://rdf.mindbabble.com/ns/g.11b60n70vp> <http://rdf.mindbabble.com/ns/measurement_unit.dated_percentage.date> "2014-06"^^<http://www.w3.org/2001/XMLSchema#gYearMonth>;
   
<http://rdf.mindbabble.com/ns/measurement_unit.dated_percentage.rate> "6.4".


Why doesn't it recognize the turtle files correctly?


Am Dienstag, 11. November 2014 13:55:58 UTC+1 schrieb pulseof...@gmail.com:

Rob Vesse

unread,
Dec 5, 2014, 5:16:38 AM12/5/14
to Stardog Mailing List
You've named them with a .nt extension which means Stardog is assuming they are NTriples which they are not, Turtle files normally have a .ttl extension

You simply need to rename your files to have the correct extension and then Stardog will process them as Turtle.

Rob

From: <michael...@gmail.com>
Reply-To: Stardog Mailing List <sta...@clarkparsia.com>
Date: Thursday, 4 December 2014 17:10
To: Stardog Mailing List <sta...@clarkparsia.com>
--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

To unsubscribe from this group and stop receiving emails from it, send an email to stardog+u...@clarkparsia.com.
Reply all
Reply to author
Forward
0 new messages