Property Value Issue

42 views
Skip to first unread message

Nabeel Ahmed

unread,
Aug 18, 2015, 1:44:01 AM8/18/15
to Getty Vocabularies as Linked Open Data
Dear Getty Team:

I am importing explicit RDF dumps into my own Triple store for internal analysis and experiencing following issue: 

1. In TGNOut_PlaceType.nt file during import it give following issue: '-10000' is not a valid value for datatype http://www.w3.org/2001/XMLSchema#gYear [line 414254]  and import stops

2. In TGNOut_2Terms.nt file during import it give following issue: '-35000' is not a valid value for datatype http://www.w3.org/2001/XMLSchema#gYear [line 14335339] and imports stops.   

3. In ULANOut_Biographies.nt file during import it give following issue:'-30000' is not a valid value for datatype http://www.w3.org/2001/XMLSchema#gYear [line 3023096] and imports stops.   

Kindly suggest updates or how to fix these issues. 

Regards, 

Vladimir Alexiev

unread,
Aug 18, 2015, 2:13:34 PM8/18/15
to Getty Vocabularies as Linked Open Data
Hi Nabeel! What is the triple store you're using and what is the RDF import tool? Usually they are based on Jena RIOT or Sesame RIO.
-10000 is a valid year, that's 10000 BC (quite a long time in the past).

Also: if you load only the explicit triples, you won't have any inferred triples, eg gvp:broaderExtended, skos:broader, skos:broaderTransitive.
You need to implement your own inference as described in the doc section Inference.

Cheers!

Nabeel Ahmed

unread,
Aug 18, 2015, 10:54:47 PM8/18/15
to Vladimir Alexiev, Getty Vocabularies as Linked Open Data
Hi Vladimir:

I am using GraphDB as triple store and am using their server import feature to import the values into the store. During that process I am experiencing these issues. At that stage no inference is done and I guess its only a simple import.

Regards, 

--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/GzYUXOuR38A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/956af266-db85-4262-91ba-0662b4e2e24b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Malik Nabeel Ahmed Awan

Vladimir Alexiev

unread,
Aug 19, 2015, 8:22:47 AM8/19/15
to Getty Vocabularies as Linked Open Data, vlad...@sirma.bg, boian.s....@gmail.com, gga...@getty.edu
Weird: vocab.getty.edu also uses GraphDB and we don't haven't seen this problem. Or maybe we just haven't noticed it?
Which file do you see the problem with?

Nabeel Ahmed

unread,
Aug 19, 2015, 8:41:24 AM8/19/15
to Vladimir Alexiev, Getty Vocabularies as Linked Open Data, boian.s....@gmail.com, gga...@getty.edu
Dear Vladimir:

I am getting issues in following files, also mentioned in my initial post also.

1. In TGNOut_PlaceType.nt
2. In TGNOut_2Terms.nt
3. In ULANOut_Biographies.nt

regards,

On Wed, Aug 19, 2015 at 5:22 PM, Vladimir Alexiev <vlad...@sirma.bg> wrote:
Weird: vocab.getty.edu also uses GraphDB and we don't haven't seen this problem. Or maybe we just haven't noticed it?
Which file do you see the problem with?

--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/GzYUXOuR38A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Boian Simeonov

unread,
Aug 19, 2015, 10:20:11 AM8/19/15
to Getty Vocabularies as Linked Open Data, vlad...@sirma.bg, boian.s....@gmail.com, gga...@getty.edu
Dear Nabeel,

I am trying to reproduce your problem unfortunately without success. I explored three different approaches to load the data and all of them succeed, so I need more information about:
1. What is your GraphDB version?
2. Which tool for loading data you are using? 
2.1 It will be useful if you explain the loading procedure in details. 

Best,
Boyan

Nabeel Ahmed

unread,
Oct 12, 2015, 2:52:44 AM10/12/15
to Boian Simeonov, Getty Vocabularies as Linked Open Data, Vladimir Alexiev, gga...@getty.edu
Dear Boian:

This is a very old conversation in which I reported issue with Getty Data Loading in GraphDB. The thing were moving on but due to some issue, by graphdb instance crashed, as it was DEV environment so we were not having any backups so have to reload the data again and again now getting same issue. Answers to specific questions are :

1. What is your GraphDB version?     >> GraphDB v6.4
2. Which tool for loading data you are using? >> In built tool image attached for your reference
2.It will be useful if you explain the loading procedure in details.
        >> I download files from getty server using wget
        >> Unzip then in location specified by graphDB Import --> Server section.
        >> Click on import in front of file and import the file
   ---- Image attached for your reference


And now again an getting the same issue with gYear property as was reported previously. If you can suggest me some quick solution as my data import is in mid way.



regards, Inline image 1





For more options, visit https://groups.google.com/d/optout.
issue-to-getty.png

Vladimir Alexiev

unread,
Oct 12, 2015, 3:37:34 AM10/12/15
to Getty Vocabularies as Linked Open Data, boian.s....@gmail.com, vlad...@sirma.bg, gga...@getty.edu
Looks like a bug (over-eager validation) in the GraphDB Workbench. I've posted an internal bug (WB-716). Possible workarounds:
  • Use Sesame Console (console.sh) to load the data: that's what Getty uses 
connect <host>/openrdf-sesame .
open
<repo> .
load
<dir>/ULANOut_Full.nt into http://vocab.getty.edu/dataset/ulan .
(Note: they load the Explicit exports and generate the Total exports while the shots above show you load the Total exports. As it says in http://vocab.getty.edu/doc/#Total_Exports "Please note that we have not tried out this process yet." but it should work.)

Nabeel Ahmed

unread,
Oct 12, 2015, 4:32:16 AM10/12/15
to Vladimir Alexiev, Getty Vocabularies as Linked Open Data, Boian Simeonov, gga...@getty.edu
Is it possible to chunk data using first option of Console.sh?

regards,


--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/GzYUXOuR38A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Boian Simeonov

unread,
Oct 12, 2015, 4:42:08 AM10/12/15
to Getty Vocabularies as Linked Open Data, vlad...@sirma.bg, boian.s....@gmail.com, gga...@getty.edu
Dear Nabeel, 

There is another option to load data in GraphDB. You can use getting started tool which as available in every GraphDB distribution package. Documentation about this tool you can find here - http://graphdb.ontotext.com/display/GraphDB6/GraphDB-SE+Configuration#GraphDB-SEConfiguration-GettingStartedApplication . With this tool you can load data very easily on already deployed repository. 

Best,
Boyan

Vladimir Alexiev

unread,
Oct 12, 2015, 5:09:39 AM10/12/15
to Getty Vocabularies as Linked Open Data
console.sh doens't have a chunking option.
However, the biggest file that Getty load with console.sh is TGNOut_RevisionHistory.ttl (1.5Gb) so I'd first try to load the full files, and only split if that fails.

You can use the linux/unix "split" tool to split NT files by line, eg see http://linux.die.net/man/1/split.
I think the -C option is what you need: you can limit to 1500M but it will emit full lines (won't split a line in the middle).

Boian Simeonov

unread,
Oct 12, 2015, 6:27:50 AM10/12/15
to Getty Vocabularies as Linked Open Data
Little explanation :
 - LoadRDF tool works offline, it is useful if you load data from scratch
 - Getting Started can work offline or online. This give you the possibility to load data on live repositories
 - console.sh also can load data on live repositories    

Vladimir Alexiev

unread,
Oct 12, 2015, 11:22:19 AM10/12/15
to Getty Vocabularies as Linked Open Data, boian.s....@gmail.com, vlad...@sirma.bg, gga...@getty.edu

The XSD Datatypes spec has a specific note about infinite datatypes that states the following about gYear:

All ·minimally conforming· processors must support nonnegative ·year· values less than 10000 (i.e., those expressible with four digits) 

GDB Workbench uses Sesame 2.7, which implements minimally conforming support for years: it throws an exception whenever it sees a gYear below -9999. This restriction is removed in Sesame 2.8. We have a task to upgrade GDB and GDB Workbench to Sesame 2.8, but this is not so simple, and will take more time. So for now use one of the described workarounds

Nabeel Ahmed

unread,
Oct 12, 2015, 11:59:59 PM10/12/15
to Vladimir Alexiev, Getty Vocabularies as Linked Open Data, Boian Simeonov, gga...@getty.edu
Thanks for sharing. I have used the method provided by Boian as it support chunking of 500000 triples and that value can be changed as required.

regards,

--
You received this message because you are subscribed to a topic in the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gettyvocablod/GzYUXOuR38A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages