Recent changes to JSONLD on TGN - problems

76 views
Skip to first unread message

Barry Pearce

unread,
Jul 6, 2022, 8:22:33 AM7/6/22
to Getty Vocabularies as Linked Open Data
Hi all,

I import data using the JSONLD formats from the TGN, AAT, and ULAN.

Up until recently (the last couple of weeks or so) all three jsonld formats had a similar structure. However, the TGN appears to have changed.

Most notably, the record is no longer provided as and array, many values previously presented as arrays are now objects.

I have some questions:

Is this the new Linked.Art format?
If so, I thought this change was going through in August?
If not, where can I find documentation about the structure changes?
Whilst I have tried to support the new format I have a problem - I cannot see from the JSONLD record how non-preferred place types are qualified with flags - the flags no longer seem to be part of this new data format - so I cannot determine if a type of abbey is historical or current.

Thanks in advance!
Cheers
Barry Pearce
Bowed Strings Iconography Project

Getty Vocabularies LOD

unread,
Jul 6, 2022, 12:37:29 PM7/6/22
to Getty Vocabularies as Linked Open Data
Hi, Barry.

Thanks for reporting this issue. The problem is due space constraints on the server where these prebuilt JSONLD entity files are hosted and the growing size of TGN. We are working on resolving the issue, but would it be possible for you to explain how you are processing these JSONLD files? Here is an example of the Linked.Art JSONLD that will be the default JSON and JSONLD model in the future: https://data.getty.edu/vocab/tgn/7016833

We would be interested to know if the Linked.Art modeling addresses the needs of your project, and if not, why. The changeover to Linked.Art for JSON and JSONLD serializations is still scheduled to happen in early August. Backwards compatibility for the SKOS/Schema.org modeled data will still be available after this change happens.

Gregg Garcia
Software Architect
Getty Digital / J. Paul Getty Trust

Vladimir Alexiev

unread,
Jul 6, 2022, 5:52:27 PM7/6/22
to Getty Vocabularies LOD, Getty Vocabularies as Linked Open Data
Some embedded links are not quite ok.
Eg http://vocab.getty.edu/tgn/aggregations/7016833/hierarchy leads to a page with 9 triples. Maybe can be changed to a URL that uses an anchor? 


Barry Pearce

unread,
Jul 7, 2022, 10:16:49 AM7/7/22
to Getty Vocabularies as Linked Open Data
Thanks for the reply Gregg.

My (musical instrument) iconography database is expected to grow to around 50K-100K sources. Rather than import in bulk records are imported by curators on an as-required basis. The curator requests import using my web UI, which takes the ID and pulls the JSONLD from the TGN, AAT and ULAN as required. The required data is extracted from the JSONLD and stored via object mapping to an RDBMS. This is implemented in Java with Spring and Jackson JSON libs. 

So the processing on the backend is a HTTP request to load the file, and this is parsed by the Jackson libs for easy processing in Java. For a TGN import the following are extracted:
  • All names, including vernacular and historical flags and the preferred name.
  • WGS84 lat/lat in decimal
  • The location types (from the AAT) are identified within the database, and if not are also imported (Again through a similar process which pulls in the names, and the notes in the same manner), I also use the preferred flags to identify a primary type.
  • The notes.
  • The parent in the hierarchy is also identified within the database (based on TGN ID) and if not present will be imported using this same process).
A more complex trigger of an import from the Getty vocabs is importing a person/organisation from the ULAN. Again it is a similar process as above, however it can result in multiple imports from the TGN (again vertically upward) to satisfy the place of birth, place of death, and also the nationalities may be imported from the AAT.

In a simple scenario where the data for the places/nationalities already exist in my database, only the ULAN JSONLD is requested. In a more complex scenario could see the upwards hierarchy of two locations and all associated location types imported as well.

I have to say the Getty Vocabs have been invaluable in speeding up data entry, and my system remains aligned with the vocab through periodic updates.

Looking at the new Linked.Art JSON, my initial impression is that I would be looking to use the SKOS/Schema.org.  There are a number of factors that would cause me to take that route.

  • The Linked.Art data file is huge compared to the amount of data actually required. So this example is 240Kb. Just 4 requests will pull in almost 1MB of data for processing yet the actual data extracted will be tiny in comparison. Alas I don't have a TGN JSONLD prior to the change, but ULAN file for 500057165 comes in at a mere 43.2KB. Much of the extraneous information for me are things like the very detailed list of sources of the names (referred_to_by)
  • There is also a potential issue with the Parts - If I pull in Europe, does this mean the part section will pull in vast volumes of data? I can see how this might be useful for discovery/exploration of the data set but in my scenario I am only interested in the upwards hierachy (part_of).
  • The hierarchy upwards is not qualified. So the current split between hierarchical position and additional parents has been lost. In general  I am only interested in the Hierarchical position.
  • Names and (location) types are not qualified with the historical flags nor are they identified as preferred/primary (the vernacular flags are missing as well but this is less of an issue compared to the other two). For me it is very useful to know that a location used to be a monastery but it is now a museum. When dealing with artist's names these preferred names provide a standardisation, ensuring consistent labelling.

The lack of detailed qualification of names and hierarchy would be the primary reason to use the SKOS schema. The verbosity of the data (data size) is a secondary consideration due to the low level volumes, however, the end use is waiting for the imports to complete and this does tie up more resources and will impact performance to some degree. Even if the Linked.Art schema had the required data, I would probably still be investigating the other methods of accessing the vocabularies, in order to reduce the overhead. This was one of the reasons why I chose to use the JSONLD in the first place (especially as JSON is the data format used for REST with my database server - so JSONLD infrastructure was already available on the server) - it has all the information contained from the Full Record Display (albeit it is necessary to sometimes make subsequent requests to obtain the full data - but it is all obtainable) and yet yields a very low data size. This is fast and efficient, and has been great to work with.

Unfortunately the current issues on the TGN have made the TGN JSONLD unusable for me - I cannot code around them (the missing flags). My only choice currently write a new importer for another format unless the TGN JSONLD is to be returned to its former fullness in the very near future. In the meantime I am now left having to import via cut & paste. :( 

Now all this seems rather negative, but I am still a great fan of the Getty Vocabs the the huge benefit they give in terms of time saved through import and also through standardisation of the vocabularies. 

Perhaps I should move to access the vocabs using the SPARQL endpoint? I assume the flags will remain available using this endpoint? 
Thoughts/advice appreciated.

Cheers
Barry

Robert Sanderson

unread,
Jul 7, 2022, 10:58:38 AM7/7/22
to Barry Pearce, Getty Vocabularies as Linked Open Data
Hi Barry, all,

To be clear about the relationship with Linked Art... (https://linked.art/) ...

* The format in the data is quite an early representation, and many of the issues you bring up have been addressed in more recent versions.  In particular, the API definition for Places only has `part_of` and not `part`. (See https://linked.art/api/1.0/endpoint/place/) This would alleviate much of the record size issue.

* At Yale, we have implemented internal "profiles" to manage data size. These are named subsets of the full record. This pattern could help as well, as a "basic" profile could be expressed that strips out the references (which I also never find helpful) or other extraneous information, but are not part of the specification (yet).

* Linked Art does allow qualifications on names, such as primary or alternate. See: https://linked.art/api/1.0/shared/name/ ... the type would go in the classified_as field.

* The one issue that the current specifications wouldn't solve is the different hierarchy types. In this case, the suggested approach would be to retrieve all of the part_of references and only keep the ones that you actually want, based on their classified_as types. Completely understood that this is quite annoying with such large records, but that could be solved as above.

An approach to get back to the JSON-LD form would be to retrieve the Turtle serialization and run it through a JSON-LD converter - https://github.com/filip26/titanium-json-ld seems maintained and up to date with the JSON-LD specifications.

Hope that helps,

Rob


--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/535ed5b4-7a50-4c9c-b287-913c47818390n%40googlegroups.com.


--
Rob Sanderson
Director for Cultural Heritage Metadata
Yale University

Getty Vocabularies LOD

unread,
Jul 7, 2022, 11:38:30 AM7/7/22
to Getty Vocabularies as Linked Open Data
Thanks for the detailed explanation of how you are using the Vocabularies LOD data, Barry. As Rob mentions, this is an early and very basic implementation of Linked.Art that we will enhance as the specification matures and we evaluate use-cases such as yours.

In regard to your current issue, you could use the SPARQL endpoint to generate the full entity serialization as we do. In the upcoming version  of the Vocab LOD web site, we will be doing this CONSTRUCT query dynamically instead of prebuilding the full set of entity files for each serialization to order to alleviate the amount of space needed to contain this ever growing set of static files.

The CONSTRUCT query we use is the following, with the entity ID bound to the same TGN record I used before (tgn:7016833 ):
```
PREFIX tgn: <http://vocab.getty.edu/tgn/>
PREFIX gvp: <http://vocab.getty.edu/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX iso: <http://purl.org/iso25964/skos-thes#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX aat: <http://vocab.getty.edu/aat/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
CONSTRUCT {
  ?s  ?p1 ?o1. # subject
  ?ac ?p2 ?o2. # change action
  ?t  ?p3 ?o3. # term/note
  ?ss ?p4 ?o4. # subject local source
  ?ts ?p6 ?o6. # term/note local source
  ?st ?p7 ?o7. # statement about relations/placeTypes
  ?ar ?p8 ?o8. # anonymous array of subject
  ?l1 ?p9 ?o9. # list element of subject
  ?l2 ?pA ?oA. # list element of anonymous array
  ?pl ?pB ?oB. # place
  ?ge ?pC ?oC. # geometry
} WHERE {
  BIND (tgn:7016833 as ?s)
  {?s ?p1 ?o1 FILTER(!isBlank(?o1) &&
              !(?p1 in (gvp:narrowerExtended, skos:narrowerTransitive, skos:semanticRelation)) && ?s!=?o1)}
  UNION {?s skos:changeNote ?ac. ?ac ?p2 ?o2}
  UNION {?s dct:source ?ss. ?ss a bibo:DocumentPart. ?ss ?p4 ?o4}
  UNION {?s skos:scopeNote|skosxl:prefLabel|skosxl:altLabel ?t.
     {?t ?p3 ?o3 FILTER(!isBlank(?o3))}
     UNION {?t dct:source ?ts. ?ts a bibo:DocumentPart. ?ts ?p6 ?o6}}
  UNION {?st rdf:subject ?s. ?st ?p7 ?o7}
  UNION {?s skos:member/^rdf:first ?l1. ?l1 ?p9 ?o9}
  UNION {?s iso:subordinateArray ?ar FILTER NOT EXISTS {?ar skosxl:prefLabel ?t1}.
     {?ar ?p8 ?o8}
     UNION {?ar skos:member/^rdf:first ?l2. ?l2 ?pA ?oA}}
  UNION {?s foaf:focus ?pl.
     {?pl ?pB ?oB}
     UNION {?pl schema:geo ?ge. ?ge ?pC ?oC}}
}
```
This query will work for any base AAT, TGN or ULAN entity ID. To get the JSONLD format, use the endpoint:

Let me know if this approach works for you.

Gregg Garcia
Software Architect
Getty Digital / J. Paul Getty Trust

Vladimir Alexiev

unread,
Jul 7, 2022, 6:41:07 PM7/7/22
to Getty Vocabularies LOD, Getty Vocabularies as Linked Open Data
Hi! I think the last shown query is slightly incorrect, as it will not return all data about a ULAN subject.

I think this query returns all data:
http://vocab.getty.edu/doc/queries/#All_Data_For_Subject.

Cheers!

Getty Vocabularies LOD

unread,
Jul 8, 2022, 11:04:46 AM7/8/22
to Getty Vocabularies as Linked Open Data
Thanks, Vladimir.
Indeed the query I posted is missing a few ULAN properties.

Barry - you should use the CONSTRUCT for the full subject in the sample queries documentation: http://vocab.getty.edu/doc/queries/#All_Data_For_Subject

Barry Pearce

unread,
Jul 14, 2022, 6:26:17 AM7/14/22
to Getty Vocabularies as Linked Open Data

Thanks for the information everyone.

I have now moved the data retrieval to be SPARQL based. This reduces the data transferred significantly as I can target exactly what I need. I am sure the move to Linked.Art will need some tweaks but using SPARQL resolves some of the issues.

Whilst I have now resolved my immediate issue I am still left with a charset issue (which I worked around) and problems using the POST method (rather than GET). I shall add separate conversations for these.

Thanks for the help much appreciated!

Barry

Barry Pearce

unread,
Jul 14, 2022, 8:35:09 AM7/14/22
to Getty Vocabularies as Linked Open Data
Hi Rob,

Thanks for that info. Very useful.

Many of my concerns about the Linked.Art based related to what was provided in the default JSONLD files  - moving to SPARQL with a JSONLD response has now alleviated many of these issues going forwards. I have had to learn SPARQL in the process but that is certainly not a bad thing!

As long as the Getty Vocab continues to support preferred and non-preferred in the labels, and part_of (via classified_types) then it will be a straightforward coding exercise to move to Linked.Art based vocab and it looks like Linked.Art will fulfil my needs.


Cheers
Barry Pearce
Bowed Strings Iconography Project


Getty Vocabularies LOD

unread,
Sep 13, 2022, 1:47:30 PM9/13/22
to Getty Vocabularies as Linked Open Data
Following up on this issue -

The new version of the Vocab LOD site (vocab.getty.edu) has been deployed with the Linked.Art modeled Getty Vocabularies data available for JSON and JSONLD serializations of top-level Vocab entities.

Additionally, the TGN Linked.Art modeled data has been updated:
Gregg Garcia
Software Architect
Getty Digital / J. Paul Getty Trust

Reply all
Reply to author
Forward
0 new messages