On the cleanup of Gregobase

144 views
Skip to first unread message

Matthias Bry

unread,
Jan 19, 2021, 4:27:36 PM1/19/21
to Gregorio Users
Hello,

Under the benevolent guidance of Olivier Berten, I have set out to clean up the Gregobase database, to the extent of my poor ability and limited time.

Here is a summary of the open questions, with my preferences.

1. On incipits

By now most entries have incipits; I have added a lot of them.

1a) Non-Latin incipits

The remaining ones with no incipit are from the Palmer&Burgess with numeric incipits. As a reminder, in the Palmer&Burgess gradual, latin incipits are given as titles to english pieces, except exceptions, and I'm talking about those. I see three possibilities:
1ai) the status quo
1aii) english titles that may or may not be incipits. I dislike this solution because it does not extend well to other languages and we might have some in the future.
1aiii) latin titles, the incipits of the equivalent latin piece.

I strongly favor 1aiii) because it makes the latin and english versions of the same piece show up side by side in Gregobase which eases comparison, and also because it is coherent of the rest of the Palmer&Burgess gradual.

1b) Ligatures and accents in titles

1bi) Status quo: we set no standard because people will do whatever they want anyway
1bii) No ligatures (ae, oe) and no accents.
- Pro : this has the best searchability and the best portability.
- Pro : this is how the Cantus index works (see also #6)
- Con : the major drawback is that people who typeset whole books and not booklets, and as such have to generate an index, will not be able to directly use the "title" field of the gabc header to generate the index, but will have to establish a correspondence table, if they want their index titles to be ligatured and/or accented.
1biii) Ligatures but no accents. This is the intermediate solution that does not deteriorate searchability by a lot, but would be convenient for those who will want to have beautiful indices in books with scores sourced from Gregobase.
1biv) Both ligatures and accents. This solution will be the hardest to implement and the hardest to keep maintained and consistent.

I am slightly in favor of 1bii) but I find 1biii) reasonable if the community prefers it.

2. On usage

By now all but two pieces have a usage. Olivier has added a lot of new usages recently and I feel that by now the list should be sufficient. Expanding it beyond, say, 30 different usages (there are currently 21) would make it more like a tag, while I view it rather as a category.
The two missing pieces are ambrosian and I do not want to wager what use they are supposed to have.
The addition of new usages will help unclutter the Varia usage and maybe restrict it to pieces that are actually labeled as Varia in the sources.
I have taken the liberty to list fragments of Litaniae (or whole Litaniae) into the Supplicatio usage.
In the coming weeks I intend to review every piece that is labeled Varia, to assess if we should give it another usage.

3. On versions

What is a version? This is an open question to which my answer is: the version indicates to the user which line they want to click among the several lines with the same usage and the same title.

3a : Should a piece (that is, an incipit+usage+disambiguation if needed, e.g. solemn/ferial/festal tone) typically have versions all different from each other?
3ai) Yes,
3aii) No.
I would tend to answer that yes they should, although certain pieces might sometimes justify having different versions of the same name. In any case, I find that many pieces having four or five versions that are different but all tagged "Solesmes" is not satisfactory at all.

3b : Should the version correspond to a source or of sources?

3bi) Having the version be merely a shortened version of the source, e.g. Vatican YYYY or Solesmes YYYY, with as many versions as sources, plus others for non-sourced scores.

3bii) Grouping sources
In the current state of the database this would mean having the following versions:
- Solesmes 1900s (curiously, as Gregobase stands this covers mainly LU61 which reprits old restitutions from the 1908 LU/GR)
- Solesmes 1930s (AM34/35)
- Solesmes 1960s (Everything from Cantus Selecti 57 to the Gregorian Missal 90)
- Solesmes 2000s (AM 1, 2, 3, AR 2 sq)
- Dominican : distinguishing Jandel, Gillet+Cormier, and Suarez; it would be possible to group the last two in one big Dominican 1900s group)
- And the rest more or less like now plus some cleanup of errors.

3biii) If the answer to 3a is "no", there is the option of indicating only Solesmes, Vatican and so forth everywhere.

4. On tags

As they stand, tags are a mess. I never use them and I do not have many ideas what to do with them.
In an ideal world, keeping in mind the initial goal of gregobase which was semi-automatic booklet generation (and this is how I use it weekly to build a pile of booklets that is starting to get huge), the useful thing would be to be able to cross a version with a number of tags relevant to booklet generation. Those tags would need to be:
- the feast
- the grade (of the feast) for common tones
- the common (where pieces not proper to the feast are to be found)
- the hour.
The formula would be  "hour intersect version intersect (feast union grade union common)" and then you only would need to weed out unneeded pieces from the common who are superseded by one that is proper to the feast.

However, getting anywhere close to this result would be an absolutely massive chunk of work. It would be helpable (e.g. having tools to speed up manual entry of this info) but not automatable, unless one wants to develop an AI that parses scans of liturgical books into the list of pieces of each hour of each feast.

As such I do not see tags as a priority because the cost/benefit does not look favorable - the tag system would be useful only if it is complete, unlike usages or incipits which start to be useful as soon as you start entering them.

Something that could be done with less effort would be to add the Occasion field, which is already a standardized gabc header field, and maybe a Hour field, that it automatically "Mass" if usage is in/gr/al/of/co/ky/pr/sq, and one of ten (8 hours + mass + others) for other usages.

5. On EUOUAE

We need to decide if (5i) we reflect the source and add the EUOUAE if and only if the source has it, (5ii) we do not want EUOUAEs, trusting mode information, or (5iii) we want EUOUAEs on all antiphons. As of now I tend to always include a noted first verse of each psalm in my booklets, which eliminates the need for EUOUAEs (see the default layout in https://bbloomf.github.io/jgabc/psalmtone.html for instance)

6. On the commentary

This seems to me to ask no question, since the gabc documentation defines this field: "source of the words of the score". But if the community has other views, it may be interesting to hear them. This field is generally well respected although seldom completed.

7. On the Cantus ID

Oh boy, we are entering a world of pain.
First, it must be noted that having the Cantus ID automatically gives the Usage and Occasion info, and also the Incipit, although the incipits in the Cantus database can be a bit of a mess at times.
Conversely, having the incipit, usage and occasion may help find the Cantus ID... or it may not.
In any case, the Cantus database is agonizingly slow, would it not be useful to have a dump of only its main tables to speed up search by incipit?
My point is, if we want links to the Cantus database, we need to be able to search it semi-automatically with queries getting responses in less than twenty seconds (on good days).

However, the Cantus ID problem is not a priority to me (though it seems to be one to Olivier) because the Cantus DB is a tool for scholars while Gregobase is more of a tool for cantors who need to put together booklets, or possibly people who to integrate chant into breviary/missal apps, and so forth.

In any case, I would love to hear the thoughts of the community

Matthias Bry

Sr. Maria Ruth Malagoli

unread,
Jan 20, 2021, 12:56:11 AM1/20/21
to gregori...@googlegroups.com
Dear Matthias, 
thank you so much for this job! Here's my opinion about your topics:

1. On incipits
[...] 
I strongly favor 1aiii) because it makes the latin and english versions of the same piece show up side by side in Gregobase which eases comparison, and also because it is coherent of the rest of the Palmer&Burgess gradual.

I don't sing in english nor use english books, so I guess my opinion is not useful here, but as far as I can see I agree with you. (1aiii).

1b) Ligatures and accents in titles

1bi) Status quo: we set no standard because people will do whatever they want anyway
1bii) No ligatures (ae, oe) and no accents.
- Pro : this has the best searchability and the best portability.
- Pro : this is how the Cantus index works (see also #6)
- Con : the major drawback is that people who typeset whole books and not booklets, and as such have to generate an index, will not be able to directly use the "title" field of the gabc header to generate the index, but will have to establish a correspondence table, if they want their index titles to be ligatured and/or accented.
1biii) Ligatures but no accents. This is the intermediate solution that does not deteriorate searchability by a lot, but would be convenient for those who will want to have beautiful indices in books with scores sourced from Gregobase.
1biv) Both ligatures and accents. This solution will be the hardest to implement and the hardest to keep maintained and consistent.

I am slightly in favor of 1bii) but I find 1biii) reasonable if the community prefers it.

Good point. I am in favor of 1bii) too. 
 
2. On usage

Nothing to say here...

3. On versions

What is a version? This is an open question to which my answer is: the version indicates to the user which line they want to click among the several lines with the same usage and the same title.

My answer: the version indicates the different melodic restitution of a piece with the same text and the same usage.
Thus, is very important to distinguish antiphons which may begin with the same words but end differently (i.e.: Ant. Amen amen dico vobis quia nemo propheta vs Ant. Amen amen dico vobis quod vos etc.) classifying them with an appropriate incipit because otherwise different pieces may appear as different versions.
Often, a different version means a different printed edition, but it may be nice to have the possibility to add, for instance, the particular version of a manuscript.

3a : Should a piece (that is, an incipit+usage+disambiguation if needed, e.g. solemn/ferial/festal tone) typically have versions all different from each other?
3ai) Yes,
3aii) No.

Yes, no doubts.
 
3b : Should the version correspond to a source or of sources?

3bi) Having the version be merely a shortened version of the source, e.g. Vatican YYYY or Solesmes YYYY, with as many versions as sources, plus others for non-sourced scores.

3bii) Grouping sources
In the current state of the database this would mean having the following versions:
- Solesmes 1900s (curiously, as Gregobase stands this covers mainly LU61 which reprits old restitutions from the 1908 LU/GR)
- Solesmes 1930s (AM34/35)
- Solesmes 1960s (Everything from Cantus Selecti 57 to the Gregorian Missal 90)
- Solesmes 2000s (AM 1, 2, 3, AR 2 sq)
- Dominican : distinguishing Jandel, Gillet+Cormier, and Suarez; it would be possible to group the last two in one big Dominican 1900s group)
- And the rest more or less like now plus some cleanup of errors.

3biii) If the answer to 3a is "no", there is the option of indicating only Solesmes, Vatican and so forth everywhere.

Actually, if a version is a melodic restitution,  it always corresponds to a printed or somewhere written (i.e. manuscript) source, so maybe I am in favor of 3bii).
 
4. On tags

I think that if the database is well organized and cleaned up, it would be quite easy to research the chants needed also without tags.
 
5. On EUOUAE

We need to decide if (5i) we reflect the source and add the EUOUAE if and only if the source has it, (5ii) we do not want EUOUAEs, trusting mode information, or (5iii) we want EUOUAEs on all antiphons. As of now I tend to always include a noted first verse of each psalm in my booklets, which eliminates the need for EUOUAEs (see the default layout in https://bbloomf.github.io/jgabc/psalmtone.html for instance)

I am for (5i) for some reasons:
- in this way, the database file is complete and coherent with the source;
- since there are different ways to sing psalm differentiae according with the local usage and tradition or the actual skills of the choir, any choir is free to use the Euouae proposed by the source or changing it with another suitable differentia;
- adding "standard" Euouae to the antiphons that don't have it (5iii) wouldn't be always easy nor correct, from a theoretical and historical point of view;
- not including Euouae at all (5ii) would be a source of error and an obstacle for beginners not used to choose psalm tones + differentiae, I think.
 
7. On the Cantus ID

However, the Cantus ID problem is not a priority to me (though it seems to be one to Olivier) because the Cantus DB is a tool for scholars while Gregobase is more of a tool for cantors who need to put together booklets, or possibly people who to integrate chant into breviary/missal apps, and so forth.

I agree with you on this.

Thank you again! God bless you,
Sr. Maria Ruth Malagoli osb


Rob Leduc

unread,
Jan 20, 2021, 1:50:49 AM1/20/21
to Gregorio Users
Thanks for the post about data cleaning.  It's really important and it can be hard to know exactly what the organizers want in some of the fields. 

I am just getting started here, and everyone knows more about chant than I. However, I did work in clinical trials for many years and have some general experience with the organization of databases, and gathering and cleaning data. 

Two thoughts before replying to your specific questions afterwards.

Exactly what fields to collect, and how they are defined, should flow from the uses and users of the database the organizers wish to support.  How will your users interact with the database to perform these specific tasks?  Then, for implementation, error correction, and maintenance, you need the system of fields to be as short and simple as it can be to support these tasks. For example, you mention one use as to support cantors or music directors creating chant booklets for a particular mass or office. How will they do that?  If there is no system of tags for feasts/seasons etc., then will they have to search incipit by incipit?  This assumes they actually know the incipits they are looking for.

Secondly, freely entered text fields are usually garbage.  Differences in capitalization, spelling errors, etc. kill the ability to search those fields.  Obviously, some of them have to be free text, like the incipit and the chant itself, and so we need proofreaders.  In the data collection business, we usually had two people enter the same data independently of one another and then reconciled differences ("double data entry").  Without that, you'll have much higher quality data if you can organize the other fields you collect like usage is currently designed, as a pull down menu with fixed options.  Then you need a document defining these fields so people know what you want.

Although that is a lot of work on the volunteer organizers, it will pay off many times over in data quality and ease of maintenance.

My two cents on the rest:

1.a.1 non-Latin Incipits.  Using Latin incipits when one is not given assumes both the data entry person and the person doing this search know the correct equivalent title.  But if it is cleaned properly, then I agree that having languages other than Latin turn up in a search by incipit is an advantage.  If this is currently only one or just a couple of sources, it shouldn't be hard for editors to keep up with submissions and review the incipits, but it might take some dedication on the part of the organizers themselves.

1.b. Ligatures and accents.  I'm for 1.b.ii, omitting both from incipits, especially if that is a major search field. We're just talking about one field (title/name) so someone adding other fields with ligatures in their gabc can edit the few names they will deal with in any particular project.  It opens up the question what to do about ligatures in the actual text of the chant, I suppose.

2. Usage. I do think it is important to have the Varia cleaned up properly, which is going to require periodic editorial review.  All items with blank or Varia for usage will need some editorial help.  Giving some pointers or examples in an accompanying document might prevent, say, settings of the Venite as Varia or Canticle, or Invitatory antiphons, typically labelled Invit, as the Invitatory.  Depending on available options in the list, of course.

3. Versions. The user is already clicking on a specific book; perhaps the source can be used to assign the version field without additional user input?  It's best not to depend on the user to enter things that can be obtained computationally/through logic in order to avoid typos.

4.  Tags.  This is the heart of it and why I referred to thinking about the tasks the organizers wish the database to support.  It seems like it would be best to pick a selection of fields you want to collect to 1) uniquely identify a particular chant and 2) allow a user to select a label to get chants for a particular mass or office.  Then those tags should be entered using pull down menus rather than allowing free text entry.  These would then function like the Usage rather than free entry.  Then the existing tag entry system should probably be discarded.  The system of tags in the Cantus database is probably too complicated but you could choose a subset that maximized "coverage" while minimizing list of options in various fields.

5. EUOUAE - The source I've been working with often only gives a mode 1-8 and no mode differential, but supplies the EUOUAE.  Trusting that the entered mode is correct is trusting an idiot like me to pick the right differential or it would go unreported; not to mention the problem of typos in the mode field(s).  So I think reflecting the source is probably best as the other data you get from that source may be tied up with the editorial choice of the original editors regarding presentation of EUOUAE in that source.  A general principle is that you don't want to burden your data entry people with thinking - they should be able to just type what they see.

6. Commentary. It can be a convenience to provide this, but it should be expected to match the source, and different sources may have different schemes for doing this. Proof readers and data entry people need clear direction about what you want here - match the source? Required? If so, proof readers should proof read it.

7.  Cantus ID. The question, as you mention, really hangs on how you expect the user to use the data base.  For the users and functions you wish to support, is the user likely to want the Cantus ID?  Would they be able to search the Cantus data base successfully with a limited number of well-collected tags from Gregobase? Are they likely to come to Gregobase with a Cantus ID to search for a chant?  Or is this about some kind of automated look-up?  It is hard to coordinate a database with someone else's database external to your own group and maintain that coordination over time.  And it's not worth much as a field if it is not confirmed by proof readers. 

Whatever you do collect, every field should be verified by the proof reader, not just the chant itself.

Thanks for putting up with my ranting.  And thanks to all who organize and contribute to this great resource!

Rob Leduc

Matthias Bry

unread,
Jan 20, 2021, 2:51:16 AM1/20/21
to gregori...@googlegroups.com
A great many thanks to Sr. Maria-Ruth and Rob for their answers.

Both answers converge towards (1aiii) (latin incipits), (1bii) (no
ligatures, no accents in titles) and (5i) (match EUOUAE of the
source).
I very much agree with Sr. Maria-Ruth's arguments in favor of matching
EUOUAE of the source. I also agree that text disambiguation should
happen in the title using three dots e.g. "Pange lingua...
certaminis", as is done in all chant books.

(On a side note to Rob for 1aiii, there are currently one source in
English and a few scores in Polish so this choice is, for now, fairly
easy to maintain, especially given that the Polish score all have the
latin incipit in the Remarks field. This is no more than 20 entries to
correct).

I agree with Rob that a database's editorial choices should be driven
by the user's process (but Olivier ultimately owns the site so we are
not going anywhere without his approval or at least benevolent
neutrality ;-) )
Here is my booklet process at least:
- I go to DivinumOfficium.com (for Monastic and EF) or
SocietasLaudis.org (for OF) to get the full text that is ultimately
going into the booklet (Latin+Translation). So I know beforehand all
the incipits that I need.
- I search the first piece by incipit, e.g. first antiphon, or
introit. I then click on all versions of it (because as it stands I do
not trust the Version field), and choose the one that I want
(Dominican, Vatican, New Solesmes w/o rythmic signs, Old Solesmes,
Even Older Solesmes as reflected in the LU), and download the GABC.
- I click on the little arrow to the right above the source image down
on the right column to get the next score in that source, which should
be the next antiphon of the hour that I want, or the gradual that I
want, rinse and repeat. If I'm not lucky and it is not there, I will
go back to the previous step and search by incipit.

In light of this:
Re:Tags : The only way I would use Tags would be if we had a) a sound
Version field b) tags by hour c) tags by feast d) the ability to cross
those three informations.

Re: Cantus ID : I will very occasionally come in from the Cantus db
searching if a particular chant has been restituted, but even then I
will search by incipit and not by Cantus ID. In my view, the use of
the Cantus ID in Gregobase is the other way: be able to look at the
manuscripts, coming from a particular restitution that has been
entered into Gregobase, in order to assess its quality. Since I am not
a scholar, even though I am familiar with ancient neums, this use case
is for another category of users than me :-)

Re: Versions : I would same quite some time if we got rid of the
ubiquitous "Solesmes" versions that gives no useful info. Otherwise I
have no strong opinion.

Re: Commentary : It should match the source and therefore be optional
since not all sources have them. I think this is the natural answer.
If the source has it, it is nice to have, but not a big deal if it is
not filled in.

Have a blessed day,

Matthias Bry
> --
> Gregorio homepage: http://gregorio-project.github.io
> Archives for the old mailing list: http://www.mail-archive.com/gregori...@gna.org/
> To report a bug, please post to: https://github.com/gregorio-project/gregorio/issues
> ---
> You received this message because you are subscribed to a topic in the Google Groups "Gregorio Users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/gregorio-users/3P1hHtW1XOI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to gregorio-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gregorio-users/4fbecfa4-4fb4-4e07-8204-fdeb06fdec3an%40googlegroups.com.

Rob Leduc

unread,
Jan 24, 2021, 8:58:05 AM1/24/21
to Gregorio Users
Matthias,

Thanks very much for your post about methods of preparing books and cleaning the data.  As a non-expert, it is extremely helpful to me to know these things and to have clear definitions for fields so that I can provide good information.  A couple of additional points or questions:

1) Training/Instructions.  I looked over at Cantus to see what they have in terms of instructions for data entry and found these:


Now, these are obviously far more extensive than Gregobase would need, and are made explicitly for transcribing from manuscripts (rather than modern books) to create (among other things) searchable standardized text and melodiees, not just searchable incipits, for scholarly purposes.  But I think it would be helpful to have something similar (and much shorter) on the Gregobase site addressing the desired data requirements to accomplish the goals of the Gregobase site, which may help cut down on future clean up work.  I could try to draft something, although as a non-expert with little experience, I'm not sure I am the best choice.  I could try to put together a first draft, if you like, but it would need heavy revue.

2) More on standards.  I'd also like to ask what you think about my particular case of interest.  I've been focusing on the Nocturnale Romanum out of a personal interest in matins.  I gather the work is a little idiosyncratic, but its method of presenting antiphons is to have just the incipit with mode and EUOUAE info prior to the psalm(s), and then the full antiphon after the psalms, without mode, star for the incipit, or EUOUAE.  For example, scroll briefly to page 5 of the source here:  https://gregobase.selapa.net/source.php?id=23&images=1&index=3 to see the treatment of e.g. Beatus vir prior to Psalm 1 in Nocturn I of a Sunday.

It seems clear we would not log both entries in the data base - anyone who wanted an incipit-only snippet could edit down the larger one.  But I'm not sure if I should then add a star after the incipit or add EUOUAE, even though one is lacking in the source at that entry.  

I would propose entering the antiphon linked to the page after the psalm where the full antiphon appears, with mode tags copied from before the psalm, and a star after the incipit (identified as given in the source prior to the psalm).  I would leave out EUOUAE, but if people feel strongly otherwise, I would be happy to copy that as well. 

Currently a number of the entries (looking at the 1st nocturn for Sunday) are not standardized this way. What do you think is best?

3)  Disambiguation.  I am not sure how to address determining when I need to add text to an incipit in the name field for disambiguation purposes.  For example, the case of Beatus vir, above.  A quick search of the Cantus Index for Beatus vir gives two pages of hits; even adding the antiphon tag only reduces to 57 possibilities, although a number of these can be eliminated by eye as they do not begin with the text Beatus vir, but merely contain it.  How should I determine a disambiguated entry from this?  Or is there a better way?  This is the kind of problem you run into when you go beyond a simple "key what you see" instruction for data entry.

4) Cantus ID. The case of Beatus vir above is a good case study of the difficulty.  I am at a loss to identify the proper one.  If I had instructions on how to best search and how to determine what constitutes a match, I would be happy to add these ids, at least in my own work.

5) Tags.  The current usage menu seems quite serviceable, in my definitely inexperienced opinion.  If we wanted to move towards a calendar tag, I think it is possible in a limited sense.  While the Cantus database lists some 1700+ tags, many of these are specific to pretty specialized feasts, e.g. Edwardi Conf., de morte, and a subset are so antiquated or local that they would not turn up in the printed (non-manuscript) sources that Gregobase would list.  So the Cantus list could be cut down greatly to the temporale tags, major feasts from the sanctorale, tags for the commons, and then maybe a generic tag for "proper to the feast" for feasts not listed.  Again, this would have to be made available as a pull down menu of choices (perhaps separated into different menus for temporale/sanctorale lists to avoid a single long menu) to avoid typos, etc. 

Whatever guidance you can provide is most welcome!

Best wishes,

Rob

Matthias Bry

unread,
Jan 24, 2021, 9:38:21 AM1/24/21
to Gregorio Users
Hello Rob,

1) Training: I agree that a concise set of instructions would be quite helpful. Whoever drafts it (and I might, or we could have a group session on Discord : https://discord.gg/p5zZgS2E5P ), it needs to be reviewed by the major contributors. But most importantly, we need input from Olivier, who would be the one promoting those instructions on the site.

2) Semidoubled antiphons (the technical term for what you describe): 
[ For the record, the NR2002 semidoubles antiphons for ferias and feasts below double rank, because it does not conform to the 1960 code of rubrics (which doubles all antiphons) but to Divino Afflatu (1911) for some reason (the reason being that many a traditionalist holds in contempt the 1954-1962 reforms; an endless debate with good arguments on both sides.) ]
Out of coherence between semidoubled and doubled antiphons in the NR (that is, all those found in the proper of saints), I would insert both the incipit star and the EUOUAE in the scores. Of course there must be only one database entry per antiphon from the NR.

3) Disambiguation: I am afraid the Cantus database is to be of little to no help with this. What I currently do is look up (directly in Gregobase) the incipit of the score that I transcribe. If I see that it needs disambiguation, I lengthen it or triple-dot it accordingly. I agree that this is unsatisfying because it gives contributors excess freedom. However the number of these pieces is not to great that maintainance of the database with this method would be impossible. 

4) There are some experts of the Cantus DB here, I will let them speak.

5) I agree with the sentiment. An intermediate solution would be to have one of those "smart" fields where it gives suggestions from a list based on what you type, and gives also the possibility of adding a new value. Do not underestimate the vast number of saints who have proper antiphons at least for the Magnificat and Benedictus in at least one rite: it seems to me that having a set list (like we have for usages) would not be a good idea because we would need to add new entries constantly.
In any case, this would mean significant involvement of Olivier: to be honest I do not have the courage to code it since I find the DB very usable as is, once the imprecise "Solesmes" version is gone.

Yours

Matthias

Olivier Berten

unread,
Jan 25, 2021, 6:14:16 PM1/25/21
to gregori...@googlegroups.com
Hi!

First of all a big "Thank you" to Matthias for his work on the database!

I'll just throw here a few random thoughts.

First a little context: I'm a very amateur programmer and I started this project as a visual version of http://www.caecilia-project.org/ after having taken a little part in the development of Gregorio, in a very specific context as my knowledge of the gregorian repertoire is fairly limited. Given the few sources used back then, the only meaningful versions were Solesmes and Vatican. This has of course changed. And the fields as described at https://gregobase.selapa.net/?page_id=18 were the ones I had data from Andrew Hinkley's gabc files.
A few more fields came in the meantime, following specific requests. But indeed now, seeing how the tags are growing, adding feast and hour fields seems to make sense. On that topic, Benjamin Bloomfield has done already quite some work in linking Gregobase to feasts and hours with his jgabc tool https://github.com/bbloomf/jgabc/blob/master/propersdata.js

About Cantus ID, it has indeed been requested by some scholars and I thought it might be useful to use it in order to be the common field between different versions as incipit might not be really consistent. But after Matthias' work of standardization, this might be a useless complication.

Since version 4 of Gregorio, there's a <eu> tag meant to enclose EUOUAE. If this is used consistently, it would be easy to programatically remove it and offer a version without it, like there is already for hymns with only the first verse.

The development has become very slow since the first release of Gregobase because I came from young married to father of 4 boys so my free time is rather sparse ;-) But seeing recently some interest in the project, I pushed myself to do some updates that were waiting for too long. All that to say, don't expect major changes any time soon but at the same time, don't hesitate to tell me about specific needs... or bring some pull requests... (sorry... the code documentation is close to nothing). I'll try to follow up a bit more than in the last few years...

Yours,

Olivier

--
Gregorio homepage: http://gregorio-project.github.io
Archives for the old mailing list: http://www.mail-archive.com/gregori...@gna.org/
To report a bug, please post to: https://github.com/gregorio-project/gregorio/issues
---
You received this message because you are subscribed to the Google Groups "Gregorio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gregorio-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gregorio-users/8bbdd379-15bd-435e-8bf0-6876be9444d2n%40googlegroups.com.

Sr. Maria Ruth Malagoli

unread,
Jan 26, 2021, 4:02:22 PM1/26/21
to gregori...@googlegroups.com
A little addition (but I don't know if this is out of topic: if so, please forgive me and disregard this message): what about an implementation on the nabc side? That is to say: is it possible to provide the nabc code (or the gabc + nabc code) as a version of the chants, or this would need an impossible amount of work?
I see the difficulties, but having a rough base to start with would be a great help for those who need to encode nabc scores...

Or maybe some other databases do already exist on this side?

Thank you all!
God bless you,

Sr. Maria Ruth

Olivier Berten

unread,
Jan 27, 2021, 5:44:48 AM1/27/21
to gregori...@googlegroups.com
Since as far as I know, gregorio doesn't do nabc by itself, I'd go for a double version specifying both the modern version and the manuscript : Solesmes 1974 + CH-SGs 359, for instance. I don't know how useful it would be to duplicate the version field... And the way to reference the manuscript is also questionable... I'd go for RISM but maybe Cod. Sang. 359 would be more readable...

Olivier


Rob Leduc

unread,
Jan 27, 2021, 12:20:43 PM1/27/21
to Gregorio Users

Hello Matthias (et al.),

1) Training: I agree that a concise set of instructions would be quite helpful. Whoever drafts it (and I might, or we could have a group session on Discord : https://discord.gg/p5zZgS2E5P ), it needs to be reviewed by the major contributors. But most importantly, we need input from Olivier, who would be the one promoting those instructions on the site.

I could be part of this.  I'm based in the US, so time zones may get to be a bit of an issue, but retired so with a flexible schedule.  I think it would be good to have at least an outline before such a discussion, which I'll try to type up unless someone beats me to it.
 
3) Disambiguation: I am afraid the Cantus database is to be of little to no help with this. What I currently do is look up (directly in Gregobase) the incipit of the score that I transcribe. If I see that it needs disambiguation, I lengthen it or triple-dot it accordingly. I agree that this is unsatisfying because it gives contributors excess freedom. However the number of these pieces is not to great that maintainance of the database with this method would be impossible.

OK - so it's local disambiguation in terms of the Gregobase database rather than some kind of global disambiguation or uniform practice across all printed books; that is, our disambiguated titles don't have to fit into a common scheme with other databases.

4) There are some experts of the Cantus DB here, I will let them speak.

Haven't heard from anyone yet, except from Olivier.  Just reiterating that if there were defined standards for how to count something as a match, then I'd be happy to try to add these.  I just need to know what they are.  But I suppose that lack of a response would be a demonstration of either the difficulty involved in doing so, or perhaps lack of interest in the result.
 
5) I agree with the sentiment. An intermediate solution would be to have one of those "smart" fields where it gives suggestions from a list based on what you type, and gives also the possibility of adding a new value. Do not underestimate the vast number of saints who have proper antiphons at least for the Magnificat and Benedictus in at least one rite: it seems to me that having a set list (like we have for usages) would not be a good idea because we would need to add new entries constantly.
In any case, this would mean significant involvement of Olivier: to be honest I do not have the courage to code it since I find the DB very usable as is, once the imprecise "Solesmes" version is gone.

A smart field would be great; not sure what the capabilities of the interface are.  Alternatively, we could probably get a very fixed set of tags for the temporale, the commons, and major feasts of the sanctorale.  If we split those into two sets, temporale+common vs. sanctorale, I think the menu for the former could be fixed, like the current usage field.  While I didn't mean to gloss over the variety in the latter, to simplify things may at some point may requiring lumping a lot of stuff together in terms of feasts of particular saints.  As a solution, an updatable smart field for the occasion of chants proper to a feast would require periodic review, like the varia field, but probably change much less quickly.  In fact, if an alert was made for a submission, the tags could probably be reviewed as they came in.  This seems similar to what is available at the Cantus database.  I'm just worried about all the ways someone might think to designate "the first Sunday of Advent" in a free response field. 

Adding a field for "mass" or "particular hour" could also help, although the former is probably identifiable from the usage.

Yours,

Rob





Sr. Maria Ruth Malagoli

unread,
Jan 28, 2021, 12:55:53 AM1/28/21
to gregori...@googlegroups.com
Il giorno mer 27 gen 2021 alle ore 11:44 Olivier Berten <olivier...@gmail.com> ha scritto:
Since as far as I know, gregorio doesn't do nabc by itself, I'd go for a double version specifying both the modern version and the manuscript : Solesmes 1974 + CH-SGs 359, for instance. I don't know how useful it would be to duplicate the version field... And the way to reference the manuscript is also questionable... I'd go for RISM but maybe Cod. Sang. 359 would be more readable...

Dear Olivier,
thank you.

I don't know if this would be an unuseful waste of time: I'll leave the choice to the developers about the best way to do this, if possible, simply remarking that having a database of the available nabc score would be an help.

Thank you very much!
God bless you,

Sr. Maria Ruth osb

Olivier Berten

unread,
Jan 28, 2021, 1:56:03 PM1/28/21
to gregori...@googlegroups.com
Dear Sr. Maria Ruth,

My remark about the duplication of the version field was really an open one. I have the feeling it would make it unnecessary confusing since it seems to me very few people would use nabc... But I might be completely wrong...

Checking the available nabc scores in gregobase is technically easy as these should be the only ones with the pipe character (|).

There's currently only one in the database (which is pretty normal since it wouldn't render correctly until very recently)...


Yours,

Olivier

--
Gregorio homepage: http://gregorio-project.github.io
Archives for the old mailing list: http://www.mail-archive.com/gregori...@gna.org/
To report a bug, please post to: https://github.com/gregorio-project/gregorio/issues
---
You received this message because you are subscribed to the Google Groups "Gregorio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gregorio-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages