entering the SNAPDRGN garden

10 views
Skip to first unread message

Sebastian Rahtz

unread,
Jul 8, 2014, 6:06:55 PM7/8/14
to ancient...@googlegroups.com

Now that the SNAP project has started ingest finalized data from the initial core datasets, it is time to think about how to bring in material from the other partners. For some, this will be easy, as they already know to make available their data in RDF form on the open web and simply need to follow the guidelines in the Cookbook. For others quite a lot of work will be involved getting SNAP ready. 

Following todays workshop at DH 2014, I have put a blog post at http://snapdrgn.net/archives/240 which covers some of the material discussed in the "data preparation"  breakout groups.

The post describes some of the stages you may go through, and some of the problems that you may meet.

I have divided the work into six steps:

  1. Decide whether you have a set of names, a set of attestations, or a prosopography
  2. Identify your records
  3. Establish the identities online
  4. Wrangle the data
  5. Transform the data
  6. Make the RDF available

--

Sebastian Rahtz      

Director (Research) of Academic IT

University of Oxford IT Services

13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431


Não sou nada.

Nunca serei nada.

Não posso querer ser nada.

À parte isso, tenho em mim todos os sonhos do mundo.

Gabriel Bodard

unread,
Jul 15, 2014, 12:38:43 PM7/15/14
to ancient...@googlegroups.com
Thanks, Sebastian, this is really useful.

One thing it might be worth focusing on a little more is the
discussion we had at the work-sprint in Edinburgh on what we mean by a
"prosopography" vs a "list of attestations".

In brief (I'll try to write a fuller blog post about this) we agreed
on two main definitions, with one important corollary from these:

1. A prosopography is a database or other collection of name or person
references, that disambiguates and coreferences between references to
the same person *or has started the task of doing so.* Even if it is a
work-in-progress, or imperfectly executed, it is intended or
understood that in a perfect world, all entries in a prosopography
would be unique persons.

2. A list of attestations could be a list of names extracted from a
corpus, or a text with names marked-up or pointed out in some way, but
with no intention to link co-references or identify individuals in any
comprehensive way.

Both of these types of dataset have a potential relationship with
SNAP:DRGN -- "Prosopographies" will generate RDF of all their persons,
which will be ingested into the SNAP triplestore, identifiers assigned
to them, and (ideally) co-references with other datasets identified;
"Lists of Attestations" would on the other hand produce OAC
annotations aligning their name/person references to SNAP identifiers,
rather than being the source of new person records in their own right.

The other important decision that relates to this is that a
"prosopography" according to the definition above will be welcome to
contribute all their person records to SNAP:DRGN even if they are
completely overlapping with existing content, i.e. even if the vast
majority of their contribution will be in the form of co-references.
This is partly because we don't want to discriminate against datasets
that are contributed later than others, but even more importantly,
because one of the most important roles SNAP:DRGN can play is as a
source of concordance between the original datasets.

All best,

Gabby
> --
> You received this message because you are subscribed to the Google Groups
> "Ancient People" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ancient-peopl...@googlegroups.com.
> To post to this group, send email to ancient...@googlegroups.com.
> Visit this group at http://groups.google.com/group/ancient-people.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ancient-people/CANugcM8LixjdSDzwM2PO33p4BH07_ZJsAmu_ad8mZ-X7kkaX8g%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
Dr Gabriel BODARD
Researcher in Digital Epigraphy

Digital Humanities
King's College London
Boris Karloff Building
26-29 Drury Lane
London WC2B 5RL

Email: gabriel...@kcl.ac.uk
Tel: +44 (0)20 7848 1388
Fax: +44 (0)20 7848 2980

http://www.digitalclassicist.org/
http://www.currentepigraphy.org/
Reply all
Reply to author
Forward
0 new messages