If you are a student interested to work on this project for GSoC, please post your questions/introductions in this thread.
The current workflow for adding new audio
1. People contact us via
te...@tatoeba.org to let us know they are interested in contributing audio[1].
2. Once we have validated that their audio is good, they record more audio and they send it to CK.
3. CK uploads them on the server (he only has SFTP access), then notifies us via
te...@tatoeba.org that a new batch of audio has been uploaded.
The way audio is handled in Tatoeba
* The database has a table `sentences` in which there is a field `hasaudio`.
* As mentioned in the doc[3], the field can have 3 values: no, shtooka or from_users. At this point, we don't use from_users.
* When hasaudio = shtooka, the interface displays the "audio available" icon. When hasaudio = no, it displays the "audio unavailable" icon.
* Audio files for sentences are stored in a folder that is divided into several subfolder: one for each language.
* Each audio file is named after the sentence's id and stored in the corresponding language folder.
* From the web, the files in this folder can be access via the subdomain
audio.tatoeba.org. For instance, the audio for sentence #61, which is a Chinese sentence, is accessible via
http://audio.tatoeba.org/sentences/cmn/61.mp3.
The problems we have
Some of the problems were briefly mentioned in the wiki page[4] and on Kakul's Wall thread[5], but I will add some more things here.
1. It requires at least 2 people to be available to complete the task of making new audio available. As a result, it can take a while between the time a contributor submits their audio and the time it becomes available on Tatoeba. This can make people a bit frustrated.
2. Since it requires someone with SSH access to eventually make the audio available, it means it's more difficult to delegate the task of managing audio because we can't give SSH access to anyone.