count of TGN places/terms by source

25 views
Skip to first unread message

Karl Grossner

unread,
Jan 27, 2018, 2:08:20 PM1/27/18
to Getty Vocabularies as Linked Open Data
Hello,

I'm currently working with both 'explicit' dumps and the sparql endpoint, trying now to filter TGN terms/places by source. I have a list of all ~3900 sources, but need to do a grouping query to get 'source-id, source-label,count(*)'

I've tried the following, but get the not-useful browser error: "The connection was reset". I get that a lot from the endpoint, though many queries do work.

select ?source ?en ?nl ?c {

  {select ?source (count(*) as ?c) {

     ?term <http://purl.org/dc/terms/source> ?source .

  } group by ?source order by desc(?c) limit 100}

  ?source gvp:prefLabelGVP [xl:literalForm ?en].

  optional {?source xl:prefLabel [xl:literalForm ?nl; dct:language gvp_lang:nl]}}


I've also tried simply getting all term_id/source_id pairs, planning to put them in a database and do grouping & counting there, but although 2.6m pairs are returned as the count in the browser, a CSV download only gets 624,262 of them. Another mystery.


Any help, suggestions appreciated!

Richard Light

unread,
Jan 29, 2018, 6:26:09 AM1/29/18
to gettyv...@googlegroups.com

Karl,

If you're using the default endpoint [1], it might help to include a test for TGN-only concepts in your query:

  ?term skos:inScheme <http://vocab.getty.edu/tgn/> .

When I add that criterion, I don't find any sources with the gvp:prefLabelGVP property, though there are clearly plenty in the AAT.  What search did you use to get your 3900 sources?

Best wishes,

Richard

[1] http://vocab.getty.edu/sparql

--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/37e6030a-f1bc-42fc-a6be-7564c42db5f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Richard Light

Karl Grossner

unread,
Jan 29, 2018, 9:00:22 AM1/29/18
to Getty Vocabularies as Linked Open Data

 

Richard,

 

Thanks. I originally got the tgn sources from a dump file, but I can also get them from this (the sources don't have inScheme property)

 

select ?id ?short_title ?title ?license {

    ?x  void:inDataset <http://vocab.getty.edu/dataset/tgn> ;

        rdf:type bibo:Document ;

            dc:identifier ?id ;

            dcterms:title ?title ;

            bibo:shortTitle ?short_title ;

            dcterms:license ?license .

}

 

I also have the dump file TGNOut_SourceRels.nt, so I’m going to extract <term, source, sourceid> triples from that, put in a database and do my counts there – the filtered set would wind up in the database anyway.

 

 

thanks

Karl

vladimir...@ontotext.com

unread,
Feb 24, 2018, 5:52:18 AM2/24/18
to Getty Vocabularies as Linked Open Data
 
The simplest count query gives error "This site can’t be reached. The connection was reset."

select (count(*) as ?c)  {

  ?concept skos:inScheme aat:


In contrast, this query works ok:

select * {

  ?concept skos:inScheme aat:

} limit 100


Reply all
Reply to author
Forward
0 new messages