Questions in prep for a course: backing up, archiving, collaboration

30 views

Skip to first unread message

Ryan Pennington

unread,

Feb 20, 2014, 12:58:41 AM2/20/14

to say...@googlegroups.com

I am helping to prepare a workflow for a language discovery course in Papua New Guinea in April. This course will involve 10 different Papua New Guinean translation teams. We plan to collect texts with portable recorders, followed by importing them into SayMore for basic annotation. The preference will be written transcription and translation, but at the minimum we hope for every file to at least be annotated orally. Some of these files may then be imported into FLEx for interlinearization, while some annotations may be pulled out for use with literacy materials.

I have a few questions to ask regarding our workflow, so please feel free to answer any of them. This is an exciting opportunity to collect data from many under-analyzed languages of Sandaun Province, and to teach Papua New Guineans about linguistic principles.

My first question is: How can we protect ourselves from computer failure? Each of the ten groups will have a computer, and some of them may have multiple computers. How can we back up SayMore data? With our last language discovery course we used a local server onto which we backed up FLEx projects. I recognize that these files will necessarily be much large in file size. The idea we have so far is to have SyncBack automatically backup the SayMore project folder. Are there any other options?
My second question is: Is there any way to archive an entire project in RAMP? I am trying to protect myself from having to return after the course is over, and separately archive 10 sessions for 10 different groups. I recognize that the various recordings constitute separate 'events', with different sets of metadata… Still, what else can be done? Can I simply use SayMore to create IMDI files and then RAMP those files? If that is the case, then perhaps I would need to open those projects in Arbil and do some bulk editing of certain metadata fields to ensure consistency. Do you have any thoughts on the issue of archiving?
My third question is: How might two different users (on two different computers) collaborate? Maybe we would just have to create two separate projects that are similarly named, and then archive them together as two separate but related projects? I see that one can export a session to csv format, but there is no option for importing a session. I guess that means there is no way for collaboration (as is now allowed via send/receive in Paratext and Fieldworks Beta 8). Am I missing anything here?
Finally, in entering project-level metadata, what can be done about having multiple separate groups that fall under the same language? The dialects are divergent enough that they are separate translation teams, but this hasn't been updated in ISO standards yet. All I can think to do is to select the same vernacular ISO code for each of those groups, but then the Project Title will discriminate this difference.

I apologize for the length of this post. I invite your comments and suggestions, or if you have any other ideas that may help us as we move forward. I trust this course will be a good opportunity to put SayMore to the test in a number of areas with regard to its user interface, its naming conventions, etc. We will be sure to report back on how things went!

Thanks,

Ryan Pennington

SIL PNG

John Hatton

unread,

Feb 20, 2014, 11:54:31 AM2/20/14

to say...@googlegroups.com

Hi Ryan,

>I am helping to prepare a workflow for a language discovery course in Papua New Guinea in April. This course will involve 10 different Papua New Guinean translation teams. We plan to collect texts with portable recorders, followed by importing them into SayMore for basic annotation. The preference will be written transcription and translation, but at the minimum we hope for every file to at least be annotated orally. Some of these files may then be imported into FLEx for interlinearization, while some annotations may be pulled out for use with literacy materials.

>I have a few questions to ask regarding our workflow, so please feel free to answer any of them. This is an exciting opportunity to collect data from many under-analyzed languages of Sandaun Province, and to teach Papua New Guineans about linguistic principles.

Sounds exciting. As you’ve probably noticed, SayMore isn’t as complex as ELAN, but neither is it intended to be as easy as something like SayMore. The persona I keep in mind is generally a PNG university student. That said, the oral/written transcription tools do try to be simple enough that someone without a lot of computer savvy can operate them. I don’t think we’ve ever gotten the user interaction in BOLD tools quite right, and I think some kind of guided training/practice would help. If you write something up and can share it with the rest of us, we’d appreciate it.

>My first question is: How can we protect ourselves from computer failure? Each of the ten groups will have a computer, and some of them may have multiple computers. How can we back up SayMore data? With our last language discovery course we used a local server onto which we backed up FLEx projects. I recognize that these files will necessarily be much large in file size. The idea we have so far is to have SyncBack automatically backup the SayMore project folder. Are there any other options?

Others have used Dropbox, but I know in PNG that would be too expensive and slow.

>My second question is: Is there any way to archive an entire project in RAMP? I am trying to protect myself from having to return after the course is over, and separately archive 10 sessions for 10 different groups. I recognize that the various recordings constitute separate 'events', with different sets of metadata… Still, what else can be done? Can I simply use SayMore to create IMDI files and then RAMP those files? If that is the case, then perhaps I would need to open those projects in Arbil and do some bulk editing of certain metadata fields to ensure consistency. Do you have any thoughts on the issue of archiving?

Version 3’s IMDI support does leapfrog our REAP support. We have a project proposal in the works to that would bring REAP up to par, and also do some more things with FLExText export. (SIL Language Software Development now works off a system where projects are championed/written up, then ranked, then a team is assigned). I think Tim Gaved is going to be the “champion” for that one; if you can email him nice quotes, he can put that in the proposal to strengthen its appeal.

>My third question is: How might two different users (on two different computers) collaborate? Maybe we would just have to create two separate projects that are similarly named, and then archive them together as two separate but related projects? I see that one can export a session to csv format, but there is no option for importing a session. I guess that means there is no way for collaboration (as is now allowed via send/receive in Paratext and Fieldworks Beta 8). Am I missing anything here?

SayMore’s “database” is just a collection of carefully named folders and files. So you should be able to combine sessions and people between SayMore project just by moving their folders. That is, dig into a SayMore project folder and look around. Marlon I suppose a help topic would be good for this, if you don’t already have one.

>Finally, in entering project-level metadata, what can be done about having multiple separate groups that fall under the same language? The dialects are divergent enough that they are separate translation teams, but this hasn't been updated in ISO standards yet. All I can think to do is to select the same vernacular ISO code for each of those groups, but then the Project Title will discriminate this difference.

Sounds good. You probably will want to indicate the dialect as a custom the property of the recording and the speakers.

John Hatton

Senior Software Engineer/Program Manager

Language Software Development

SIL International

Hugh Paterson III

unread,

Feb 20, 2014, 7:02:31 PM2/20/14

to say...@googlegroups.com, DirectorLanguage-CultureArchives Intl

Ryan,

Several things - but not an answer to all things.

Re question #2:

If these unique texts (oral or written) are to be made available to other systems on a per text basis then each text will need to be uploaded to REAP via RAMP independently. To my knowledge when SayMore exports a package to RAMP it exports the whole project lock-stock and barrel. Meaning that the whole thing can be retrieved but then Saymore will have to be used to access that event metadata or those event's files. In my experience people who archive materials generally don't want to access the whole "project" they only want a single text (at a time anyway). I think if you are wanting to get around the Man-hour cost to archive these objects then you need to get 10 people to each spend one hour to do the task instead of one person 10 hours. Make the archiving part of the course. Be aware that only SIL Members can archive to REAP, and only content that is Copyright SIL can be accessioned to the SIL archive. So, I do not know your situation, but SIL archiving may not even be an option for you, at which point maybe the RAMP issue becomes a moot point.

Re question #4:

Separate translation teams, and divergent dialects can and are still catalogue-able under the same ISO code. In RAMP and in REAP the information architecture has a dialect option. So, the correct way to handle this is to give the projects the same ISO code and then to differentiate the projects's dialectical differences in a separate metadata element. This can also be done in SayMore (that is in a description of the project), and then also again when uploading these content to an archive.

- Hope this helps.

- Hugh Paterson

--
You received this message because you are subscribed to the Google Groups "SayMore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to saymore+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

John Hatton

unread,

Feb 21, 2014, 4:37:01 PM2/21/14

to say...@googlegroups.com

> To my knowledge when SayMore exports a package to RAMP it exports the whole project lock-stock and barrel.

To be clear about terms... SayMore 3 doesn’t have a project archiving option for REAP, only for individual sessions.

> Meaning that the whole thing can be retrieved but then Saymore will have to be used to access that event metadata or those event's files.

To make sure no one misunderstand: no one will never need SayMore to access the archived contents-- to archive files that required some niche software to read would be a bad idea. Rather, the files SayMore uses (and submits to RAMP) are separate, easily readable files. It uses a trivial xml text file for metadata. For time-aligned transcriptions, it uses ELAN’s xml format.

John Hatton

Senior Software Engineer/Program Manager

Language Software Development

SIL International

From: say...@googlegroups.com [mailto:say...@googlegroups.com] On Behalf Of Hugh Paterson III

Sent: Thursday, February 20, 2014 4:03 PM

To: say...@googlegroups.com

Cc: DirectorLanguage-CultureArchives Intl

Subject: Re: [SayMore] Questions in prep for a course: backing up, archiving, collaboration

Reply all

Reply to author

Forward

0 new messages