Since this project was proposed in August at the 7th International
Symposium on Runes and Runic Inscriptions in Oslo, this group hasn't
really seen much activity. So let's begin now.
[In case the file attachments don't come through, I've also shared them
with you via GoogleDocs, if you log in.]
Tarrin's proposal [attached: 'ProposalforanewRuneDatabase.pdf'] -
distributed at the symposium - outlines the goals of the project and
some of the reasons behind it. There are a number of technical
shortcomings in the runic database systems currently in use, for a
variety of reasons - some due to their age and some due to their
developmental history. Tarrin has also raised the issue of the
potential benefits of a collaboratively maintained resource. However
what interests me most is the potential to improve upon the existing
databases in order to make the data we have better structured and more
useful, and to broaden the sorts of questions that can be asked of it -
what can be searched, and how. To this end, I've been working on a
relational data model for runic data, and a script to coerce the dataset
of the Samnordisk Runtextdatabas 'rundata' program into it, with a view
to producing a web-based (HTML, and Web API or SOAP) interface to the
data. This would hopefully also allow broader applications for the
dataset, including GIS and statistical methods, and could then be
further linked with other Internet-accessible datasets such as the
Nottingham Runic Dictionary <http://runic-dictionary.nottingham.ac.uk/>,
FMIS Fornsök <http://www.fmis.raa.se/> and the RAÄ's recent initiative
to digitise the volumes of 'Sveriges Runinskrifter'
The data import is also supplemented with additional coordinate data
for some of the inscriptions which lack it in rundata, derived from data
collected by Tarrin Wills for the Skaldic Poetry Project
<http://skaldic.arts.usyd.edu.au/> and provided as compressed XML.
I've uploaded a copy of my proposed data structure [attached: 'Data
Model.pdf'], as well as the entity relationship model [attached:
'rundata.pdf'] of the database produced by my import program *as it
stands a the moment*. A subset of the resulting database itself is also
provided as an SQL dump [attached: 'rundata_2011-01-31-Sö-Br.sql.bz2']
from MySQL to illustrate the structure in practice (the sample contains
only inscriptions with signa beginning 'Sö' and 'Br'). I would very
much welcome input and comments on the proposed ERM and data structure.
The data model is informed largely by the sorts of data recorded in the
rundata files - although naturally it is intended to ultimately
accommodate complimentary data from other runic datasets as well - and
by lessons learned as a result of my Master's dissertation on data
structures for storing data on runic inscriptions. For all intents and
purposes the data in the rundata files has a 'flat' structure (there are
in fact four files of text, and two more for bibliographic references
and non-textual data; it is in fact this data that is the 'flattest');
in reality the serialisation of what is really a more complex set or
relations into these flat files has inevitably resulted in a good deal
of redundancy and inconsistency in the data. This
duplication/redundancy is largely eliminated by the normalisation of the
proposed relational data structure, which attempts to store each datum
only once. Inconsistencies are corrected for, where possible, by the
import script. The presence of this redundancy and inconsistency is
simply a consequence of the way the data has been entered and stored -
machine-consumability and adherence to a standardised data-dictionary do
not seem to have priorities simply because there was no compelling
reason for them to be - and this in no way reflects poorly on the
Samnordisk Runtextdatabas project. In the course of writing the import
script however, I have become painfully aware of these inconsistencies
or errors to be corrected for, and have contributed a patch upstream to
Jan Owe (maintainer of the rundata program) for those that I had
encountered as of August 2010; I intend to send him another update shortly.
The relational structure is based primarily around *objects* rather than
texts, a distinction which the Samnordisk Runtextdatabas seems to draw
in some cases and to blur in others. One objekt may display multiple
texts; each text may in turn have multiple readings, interpretations,
translations/normalisations, etc. An example of a more complex set
relationships is that of the cross-forms displayed by rune-stones as
defined by Linn Lager: one objekt (stone) may feature zero or many
crosses; each cross may be recorded as displaying zero or many styles
for *each* of the seven 'aspects' of cross typology identified by Lager.
I hope that the proposed relationships are clear from the ERM diagrams
At this stage, almost all of the fields in the rundata info file can be
extracted, processed and imported into the relational structure; the
exceptions are 'fornlämningsnummer', 'ristare', 'källor', and whether a
stone is one of a pair or grouping of stones, all of which have
associated complications in terms of how they are parsed or relate to
other fields. Bibliographic data from the literature file is not yet
imported (as it depends to an extent on matching the 'källor' data).
The texts themselves can also be imported, although they aren't yet as
I've been concentrating more on the metadata up until now. However,
some pieces of data derived from the texts *are* imported, such as
onomastic data and the different facets/texts that appear on each
object. The data model does not yet address the proposal to make the
database collaboratively editable. This will inevitably require changes
to the data model, but for the time being I am keen to get the data
structured and queryable over the web as a proof-of-concept.
The issue of how best to process and *record* the corpus text data
requires some discussion. For the texts in rundata, there are first a
series of characters that must be substituted to compensate for
concessions made to the limited range of the CP-1252 character set used
in the data files: ⟨ñ⟩->⟨ŋ⟩, ⟨ô⟩->⟨ǫ⟩, and non-initial ⟨R⟩->⟨ʀ⟩ are the
most frequently occurring examples, but there are others. However there
are other features of the texts which must be marked up in some way, and
suggestions on how best to proceed would be welcome. Texts in the
rundata files mark standard features such as uncertain and supplied
readings, damaged or illegible staves, and bind-runes, and the
occurrence of latin majuscules. How these features might best be
recorded is not immediately clear. I would favour marking up the texts
using XML-style tags, and established TEI-based standards such as those
already used for MENOTA <http://www.menota.org/> or EpiDoc
<http://epidoc.sourceforge.net/> would seem ideal for this. However the
inclusion of XML elements may cause problems with indexing and searching
the texts. Maintaining a parallell markup-free copy of each text for
indexing would solve this, but seems somewhat inelegant. Suggestions
and comments are welcomed. There is also the question of if and how the
corpus should be lemmatised. This is not really my area of expertise,
so I will defer to those more knowledgeable; however it seems to me, is
largely a solved problem thanks to the excellent work of the Nottingham
Runic Dictionary, but there is still the issue of how the two resources
might best link and interoperate.
That's probably more than enough from me. There is the germ of a
data-model, and a reasonably flexible programmatic method for getting
data into it from the rundata files. I'd be very interested to hear
your thoughts on what we have so far.
with all best wishes,
> To make the terminology clear I will use the word users to describe
> anybody who only retrieves information from the database. Editors on
> the other hand can view and edit the information. Any scholar should
> be able to get his editorial account.
You raise an important issue. However, I don't think it's quite as
simple as you suggest. I think that simply granting editorial rights to
anyone who applies would not sit well with a resource which, we hope,
will be authoritative. Tarrin has previously suggested that there
should be one or two designated editors for each of the major regions -
Sweden, Denmark, Norway, the British Isles, etc. I tend to agree, and
as I understand it this is a model which has worked very successfully
for the Skaldic Poetry project.
> 1. I agree that the central concept in the database should be an
> object, not the text. Moreover, I propose that every editor will
> manage its own version of text. Let me explain it by an example: there
> are editors A, B and C. Editor A adds initial information about the
> object O: this is a stone, with some inscriptions. He adds the
> photo(s), location where the stone have been found and where it is now
> and so on. In another words, he enters *facts*: information that is
> considered to be true by all other people. Now, he also wants to
> describe inscription on this object. He starts his own branch and adds
> there anything he thinks is true about the inscription. Editor B has
> another opinion about inscription on this object. He reads it
> differently. Editor B starts his own branch of the description. Two
> descriptions from editor A and editor B exist in parallel. Anybody who
> is searching the information about object O will see that there are
> two different opinions about the inscription from object O. Needless
> to say, that editor A can not edit information in the branch of editor
> B and vice versa. But editor A can grant access to his branch to any
> other editor, for example to editor C. Now, editors A and C can
> collaborate on the branch of editor A. Of course, there should be a
> history of modifications and a suitable way to reference this history.
> Thus, if someone is doing a research and reference information from
> one branch, then this reference should be valid in case author of the
> branch will modify it.
Interesting... I see. And at what point would these edits become the
'live' version seen by ordinary users? What you describe might work
best using something like a git-based wiki, with multiple topic branches
existing in parallel which could then be merged by different editors.
However, as I've said, I haven't considered the multiple-editors side of
things in too much detail at this stage. I think it's going to be
tricky to get the balance right. For now I'm just keen to get a
cleaned-up, normalised, relational dataset online and queryable over the
> In my opinion, this mechanism is something that will outperform all
> other databases and will make information representative.
I'm afraid I'm not sure I understand what you mean here.
> The other concept I propose is the use of runes as letters. There
> should be defined alphabets consisting of runes (images with
> description). Each rune can have writing variants. When editor
> describes the inscription he should enter the text with runes using
> the virtual keyboard. This will allow to search information based on
> particular runes and their writing variants. The only thin place here
> is the question who will define alphabets.
I think this would be problematic for a number of reasons, and is most
likely beyond the scope of this project. I assume that you don't mean
using the Unicode Runic range to represent texts? That would be most
inadvisable: Unicode Runic is inadequate for representing runic
inscriptions to the required level of philological detail. (I'm not
a philologist, and even I can see that!) If you mean, instead, using a
more graphical approach to encoding rune-forms, then I think that is
*definitely* beyond the scope of this project. As I understand it, the
former HiT Sentret in Bergen - I think it was them - did a good deal of
work on developing a method of digitally encoding runic inscriptions
based on the forms of the graphs used. Although they produced a few
papers on the subject, as far as I know (and again, I may be wrong) they
met with little success. However if such a scheme were to be
successfully developed separately, the database would certainly benefit
from using it.
> 2. Languages. If we would like to broaden the audience of the database
> users (and probably developers), then I suggest to use English as a
> must have language. For the development this means that tables in
> MySQL should be named in English. I think that I could understand your
> data model, but it will be easier if it is in English.
I broadly agree, but I don't really see the problem: the table and field
names of the data model are never going to be visible to users.
Furthermore, and I have been working on the assumption that the web
interface would be internationalised to offer - at the very least -
English and Swedish. I should probably have made that explicit from the
outset! If the web interface is developed with multiple languages in
mind from the start, it will be trivial to add more later, as
neccessary. It would be nice to have Norwegian, Danish, German and Dutch
interfaces as well. The table and field names used in the data-model
are based directly on those used in Samnordisk runtextdatabas.
(One of the problems with internationalising the interface is that
almost *all* of the metadata in the Samnordisk runtextdatabas is in
Swedish. Except where it's not: some texts have only Norwegian
translations, not English or Swedish. Having an English interface is no
good if the data itself is not available in English. ;) But this is a
problem for later, I think.)
> 3. Programming language. What programming language do you use?
I use (modern) Perl. CPAN makes the impossible easy. :)
> Do you develop from scratch or use a framework? I propose to use a
> framework and have experience with CakePHP framework. Usage of a
> framework will save our time and help to avoid possible security
I have done no work on the web interface yet; I think there is much to
be discussed before then, and there's little point until the
data-structure is complete and stable. However it should go without
saying that I'll be using a web framework! Currently Dancer
<http://perldancer.org/> is looking like the most suitable option as far
as I'm concerned (PSGI FTW!), with the DBIx::Class ORM. In the
(unlikely) event that Dancer proves inadequate, I will probably go with
> The idea about designated editors indeed makes sense. However, this
> may slow down the process of bringing new information into the
I don't see that as a disadvantage; quite the opposite, in fact. There
needs to be a degree of stability in the data, otherwise its usefulness
as a reliable an citable academic resource may be compromised.
> Maybe I am missing something about the database and you can point this
> out. If there is an object with text and there are no common vision
> (reading, interpretation) of this text, what data is stored in current
> database? Is it possible with a current database to outline that there
> are different theories about this object, to find authors of these
> theories and list of their publications?
Okay, I think see what you mean. In general the answer is 'yes'... As
I said in my initial post, I've not yet set about importing the content
of the texts themselves into the database. However, the data model
shows how I intend to represent them: for each object there are
potentially one or more 'facets' or 'texts'; each of these in turn may
have received one or more 'readings', 'interpretations'
(normalisations), and 'translations' (into modern English, Swedish,
etc). Each of these readings, interpretations and normalisations is
derived from a source, and is linked to the corresponding bibliographic
data for that source. (It is no coincidence that the bibliographic
references and the content of the texts are the two main things still to
be covered by my import script - programmatically linking the two in a
reliable way is proving difficult!)
The concept of multiple texts/facets per object and multiple alternative
readings etc of a text occurs in the Samnordisk runtextdatabas. The
former are indicated by '§A'–'§O' and the latter by '§P'–'§Z'.
Since you've raised the issue I think this is probably a good point to
clarify some other aspects of the data model and import script which
might not be immediately obvious. I've tried to make it as clear as
possible, but I appreciate that having worked on it for some time I
probably don't have the same perspective as you do, looking at it with
- Spatial data: the main table for coordinates contains fields for
decimal latitude and longitude on the WGS84 reference ellipsoid, as well
as whether the given coordinates are current or original (which is why
one objekt may potentially have more than one set of coordinates. Three
of the countries which boast runic objects have their own cartesian
coordinate system which are widely in use on maps, etc. Consequently
there are three tables linked to the main coordinates table which allows
each set of coordinates to be represented in an alternate format, for
convenience, if it falls within the boundaries of these countries. The
three systems covered are RT90 (Rikets Nät) for Sweden, and grid
references under both the British and Irish Ordnance Survey systems.
- Fragments: while there is in general a 1:1 mapping between an object
and a signum, as indicated by the constraints on the main 'objekt'
table, this is not always the case with the data from rundata. In some
cases, two or more runestone fragments will have been recorded under
separate signa before it was realised that they comprised parts of the
same stone. In such cases, rundata records both signa, but one fragment
has only cursory information recorded against it, and refers instead to
another (main) fragment for full details. The 'fragment' table records
these relationships, so that if you find a sparsely-documented
inscription it's easy to find its mate.
- Aliases: in other cases, an object's signum has actually been changed.
Often there is not an obvious reason for why this might have been done,
and the practice must surely be deprecated as it 'breaks' earlier
citations for no real benefit. But nonetheless it appears to have
happened a not insignificant number of times, some of them recently.
The 'alias' table records obsolete signa alongside the ID of the object
they used to describe. If you don't know whether the signum you want to
look up is current or obsolete, you can be sure of finding the right
object by using a UNION:
WHERE objektid IN (
SELECT objektid FROM (
SELECT objektid, signum1, signum2 FROM objekt
WHERE signum1 = 'DR' AND signum2 = '411'
SELECT objektid, signum1, signum2 FROM alias
WHERE signum1 = 'DR' AND signum2 = '411'
) AS signa
(DR 411 is a significant obsolete signum; it now maps to Öl 1, the
- Materials and object types: the 'material' and 'föremål' tables
contain 'tags' or keywords describing what the object is and what it's
made of, extracted from the freetext description in the database. These
fields are not intended to be displayed, but to make searching and
filtering easier, and as an attempt to impose some kind of
data-dictionary onto the descriptions. The import script is intelligent
enough to know that e.g. 'basalt' is a kind of stone, that a 'svärd' is
probably made of metal even if that's not explicitly stated, and that a
'paxtavla' is really the same thing as an 'osculatorium'. :)
- Bracteates: If the object in question is a bracteate, it's probably
been assigned a type according the established bracteate seriation. Two
tables link the bracteate types to a specific object. I don't know much
about bracteate typology I must confess, but this seems to have all the
bases covered; the use of a separate table for the types themselves
allows for future expansion if we wish to record further information
about the types, such as their chronology.
- Dating: all objects in rundata are of course dated at the very least
to a period: U, V, or M. For many however, more specific dating
information is available. The import script attempts to assign terminus
post quem and terminus ante quem dates to each object as narrowly as it
reasonably can with the available data. First it uses the information
provided in the freetext 'datering' field; failing that, it falls back
on the TPQ and TAQ dates of Anne-Sofie Gräslund's animal head styles, if
the object has them. Finally, as a last resort, it assigns the broad
TPQ and TAQ dates of the U/V/M periods themselves. As with the material
and object types, this is more to allow for searching (by date ranges)
than for output.
- Location: the Plats -> Socken -> Härad -> Landskap -> Land hierarchy
is fairly self-explanitory, with two caveats. Firstly, only Denmark and
Sweden have different values in 'Landskap'; all other countries have
only one value, which is the country name. Secondly, 'Gamla Socknar':
the 'socken' field in rundata often gives the name of a socken, but then
notes in parentheses that this was the previous socken, and it's now
something else, or that this is the current socken, but it was
previously different. These shifts in parish boundaries are represented
in the data structure with a seperate table which records the 'gammal
socken' of the object. A link table is also used since it can't be
modelled as a 1:1 mapping to the 'nuvarande socken': two parishes may
have merged, or one may have been split in two, or the boundaries
between parishes may simply have been moved.
...and that's probably more than you ever wanted to know! I hope it's
at least somewhat useful, anyway.
> 1. *Interface*: this is the big challenge, but I think we're generally
> agreed on the web as the platform, and we have the expertise to develop
> this. We need to build in multilingual support at all levels of the
> interface and content,
> so this will need to be built into the model too. I think that the
> underlying database, APIs, etc. should be in English, with a
> multilingual interface.
I can make a topic branch and see about changing the table and field
names, if that would be helpful.
> 3. *Runes as objects or text*: As a philologist, I'm concerned that SRDB is
> strongly oriented towards treating runes as objects rather than textual
> sources. Both need to be incorporated into the model, which Marcus has
> achieved to a certain extent.
As an archaeologist, I agree that it's important to have both. Data
about the inscriptions and the objects they're on is often as important
as the texts themselves, and provides valuable contextual information.
Both the SRDB and the new data model provide for this. Indeed, most of
the data model is actually concerned with metadata rather than the
content of the text itself!
> What I would also like to see is the incorporation of data related to
> the text, especially lexical, including names (place and personal).
Placenames and personal names as marked in the SRDB are extracted by the
import script and listed in the 'namn' table which links them to the
texts in which they occur. However, nothing further has been done with
them as yet; they're not lemmatised, and so are stored for the time
being in the form in which they appear in the inscription, which
inevitably results in duplication if the same lexeme(?) appears with
different inflections. It would be great to use the 'stemmed' form
instead, but I'm afraid this is outside my area of expertise.
> I've implemented many issues to do with the second and third points in the
> skaldic database and I'm happy to advise, but I wonder if Marcus has time to
> do some further work on this?
Yes, absolutely. Texts and citations are the two main areas that still
need work, but when it comes to the lexical and philological side of
things I will need guidance from better minds.