ULAN display name

124 views
Skip to first unread message

David Lowe

unread,
Jun 4, 2015, 11:32:07 AM6/4/15
to gettyv...@googlegroups.com
I'm trying (& failing) to query out ULAN names in the display, not the preferred (last, first), form. Using this basic sample query, I've played with the parameters to no avail. But then, I'm completely, absolutely new to SPARQL. Help?

select * {
  filter exists {?x gvp:agentType/gvp:broaderGenericExtended aat:300025101}
  ?x gvp:prefLabelGVP/xl:display ?name;    
     foaf:focus/gvp:biographyPreferred [
       schema:description ?bio]}

David Lowe

unread,
Jun 4, 2015, 11:36:59 AM6/4/15
to gettyv...@googlegroups.com
And, related: how to get the bio as just the Nationality and Dates, without the roles?

Vladimir Alexiev

unread,
Jun 7, 2015, 3:40:50 AM6/7/15
to gettyv...@googlegroups.com, davi...@nypl.org
You have to look for gvp:termDisplay=Display, see http://vocab.getty.edu/doc/#Term_Characteristics:
select * {
 
?x skos:inScheme ulan:.

  filter exists
{?x gvp:agentType/gvp:broaderGenericExtended aat:300025101}

 
?x xl:prefLabel|xl:altLabel [
    gvp
:termDisplay <http://vocab.getty.edu/term/display/Display>;
    xl
:literalForm ?name].
 
?x foaf:focus/gvp:biographyPreferred/schema:description ?bio
} limit 100


Most but not all ULAN agents have a display name, compare this count (222879):
select (count(*) as ?c) {
 
?x skos:inScheme ulan:.
  filter exists
{?x (xl:prefLabel|xl:altLabel)/gvp:termDisplay <http://vocab.getty.edu/term/display/Display>}}

to the total (240291)
select (count(*) as ?c) {
 
?x skos:inScheme ulan:}

A few agents have several display names (222889 names):
select (count(*) as ?c) {
 
?x skos:inScheme ulan:;
   
(xl:prefLabel|xl:altLabel)/gvp:termDisplay <http://vocab.getty.edu/term/display/Display>}


Vladimir Alexiev

unread,
Jun 7, 2015, 3:50:19 AM6/7/15
to gettyv...@googlegroups.com, davi...@nypl.org
> how to get the bio as just the Nationality and Dates

See http://vocab.getty.edu/doc/#ULAN_Overview and get your bearings in the graph. I assume you mean birth/death, and we'll get them out of the pref bio (but mind you, often these are estimated with wide ranges of 120-200 years, see http://vocab.getty.edu/doc/#Estimated_Dates). I also assume you want the pref nationality only. Eg for Rembrandt:

select * {
  ulan
:500011051 foaf:focus [
    gvp
:nationalityPreferred/gvp:prefLabelGVP/xl:literalForm ?nat;
    gvp
:biographyPreferred [
      gvp
:estStart ?start;
      gvp
:estEnd ?end]]}


Gabriel Kerneis

unread,
Jun 10, 2015, 3:45:58 AM6/10/15
to Vladimir Alexiev, gettyv...@googlegroups.com, davi...@nypl.org
On Sun, Jun 7, 2015 at 9:50 AM, Vladimir Alexiev <vlad...@sirma.bg> wrote:
> how to get the bio as just the Nationality and Dates

See http://vocab.getty.edu/doc/#ULAN_Overview and get your bearings in the graph. I assume you mean birth/death, and we'll get them out of the pref bio (but mind you, often these are estimated with wide ranges of 120-200 years, see http://vocab.getty.edu/doc/#Estimated_Dates).

Even for biographies of artists from the last few centuries? I assumed the "estimated dates" approximation applied mostly to dates where there is not a reliable source of truth.

Thanks,

Gabriel

Joan Cobb

unread,
Jun 10, 2015, 9:38:32 AM6/10/15
to gettyv...@googlegroups.com, davi...@nypl.org
Historically, these dates were only used by our own programmers for searching and never displayed to the end user. As much as possible they derive from the contributed known biographies. When they agree, the dates match. When they don’t, we use a set of rules to determine the span. For example: When an end date indicated as uncertain in a source, we often widen the range. When an end date is unknown or estimated as a future, it is often set to start + 100 years.

David Lowe

unread,
Jun 10, 2015, 10:53:14 AM6/10/15
to Joan Cobb, gettyv...@googlegroups.com
Yes, thanks, and that's how I'd be using them, too. I was hoping for a discreet display date, like "active 1890s" or "1839-1902". But it seems that is only within the bio, which is a mashup of nationality (and not always the preferred one!), roles, and dates. At any rate, I can use these start and end dates to compare to the display dates I've extracted from the bios. Anyway, I've been able to hack together the names, nationalities & dates in a way which matches our schema (our names are all in a TMS database). I'd already matched ULAN's photographers against our database of 110,000 photographer bios, but now I'm trying to cast a wider net....

On Wed, Jun 10, 2015 at 9:38 AM, Joan Cobb <jc...@getty.edu> wrote:
Historically, these dates were only used by our own programmers for searching and never displayed to the end user. As much as possible they derive from the contributed known biographies. When they agree, the dates match. When they don’t, we use a set of rules to determine the span. For example: When an end date indicated as uncertain in a source, we often widen the range. When an end date is unknown or estimated as a future, it is often set to start + 100 years.

--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gettyvocablod/95675301-96b0-4219-98e4-05fb720aa0b3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vladimir Alexiev

unread,
Jun 11, 2015, 6:02:00 AM6/11/15
to gettyv...@googlegroups.com, ker...@google.com
> I assumed the "estimated dates" approximation applied mostly to dates where there is not a reliable source of truth.

Modern usually means "dates are better known", but not always. See http://vocab.getty.edu/doc/queries/#Ancient_Artists_or_Groups_by_Nationality and try start date 0010 rather than -0001. There are some biographies with date range 0001…2090. GVP is working towards narrowing such ranges, but it takes a lot of editorial effort.

Vladimir Alexiev

unread,
Jun 11, 2015, 6:07:36 AM6/11/15
to gettyv...@googlegroups.com, davi...@nypl.org
> hoping for a discreet display date, like "active 1890s" or "1839-1902". But it seems that is only within the bio, which is a mashup of nationality (and not always the preferred one!), roles, and dates.

Do you have some examples where schema:description has more date info than estStart and estEnd?


> I'd already matched ULAN's photographers against our database of 110,000 photographer bios, but now I'm trying to cast a wider net....

Could you please describe this coreferencing in a blog? 
Have you looked at Wikidata Mix-n-match?

Cheers! V

David Lowe

unread,
Jun 11, 2015, 10:42:28 AM6/11/15
to Vladimir Alexiev, gettyv...@googlegroups.com
Vladimir,
Mix'n'Match looks great, thanks for the heads-up! Just looking at it now, so I'm not sure how it works, or what the downloaded data looks like, but I'll be exploring that shortly, for sure. I'll get back to you re: schema:description in a separate response.

As for what (and how) I'm doing with photographer names:
I have, for about 12 years now, been collecting, editing, and researching photographers' biographies. I started with an old telnet database hosted by the George Eastman House. That db had ~93,000 biographies. I was able to copy and past the Name, Nationality & Dates of all 93,000 (20 entries at a time!) into a spreadsheet very shortly before that database permanently went offline. There was a lot of duplication, and a lot of entries with too little info to trifle with, and I ended up with a core set of about 65,000 names. (The GEH data has since resurfaced in this db, run independently by the former editors from GEH. It's an invaluable set of info which I use daily, but in a sadly outdated database).
Over the years, I've continued to refine and grow my photographer biographies by checking my list (no longer a spreadsheet, but in our TMS database) against various authorities, both print and online, and researches in censuses, city directories, &c. While the scope covers the entire history (and some pre-history) of photography (ca. 1820s-present), the bulk are 19th to mid 20th century photographers. Called PIC (the Photographers' Identities Catalog), I hope to get it properly online this year, but I don't have any specific projections, as I'm at the mercy of our much in-demand programmers. In the meantime, here's a map and a bit more background on a sad little google site I made. I am now working with a programmer who is helping with a more sophisticated map, the goal being that you could ask for female daguerreotypists in Kansas, for instance, then click through those points to fuller biographical entries.

Our TMS database is shared between the Prints and Photographs departments and is used for our cataloging of our collections, and so it contains many more names than just my PIC names (not just printmakers, but donors, subjects, etc.). So the risk of duplication is high. I use a lot of sql queries to spot dupes, merge or delete them. And frankly, I do a whole lot of spreadsheet work to clean and normalize data, and use the vlookup function to compare & merge data from various sources. Not particularly hi-tech but it has served me well. But now I'm trying to reconcile about 220,000 names in our database to the entirety of ULAN. I've gotten about 20,000 exact matches, but now I need to do some fuzzy matching to find more. I've been trying to use Open Refine with Reconcile-CSV, but it is (VERY SLOWLY) exploding my poor computer.

--
You received this message because you are subscribed to the Google Groups "Getty Vocabularies as Linked Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gettyvocablo...@googlegroups.com.
To post to this group, send email to gettyv...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages