Amara autocaptions for Coursera videos

363 views
Skip to first unread message

Claude Almansi

unread,
Feb 22, 2013, 7:54:10 PM2/22/13
to universal-subt...@googlegroups.com
I am sharing  this post about Amara Bot captions for Coursera videos with Amara's Deaf & Hard of Hearing Discussion List for the following reasons:

- Coursera provides courses from  62 universities to "a student base of 2.7 million" (1), likely to comprise quite a few Deaf and Hard of Hearing people, for whom how course videos are captioned is a crucial issue
Then my former questions about these Bot captions, posted on the Amara Help forum on Dec. 14, 2012, remain unanswered. But  in the mean time, I've signed up for a Coursera course: this gave me access to more but also triggered further questions.

Course participant's viewpoint

The Coursera course I signed up for is Ecole Polytechnique Fédérale de Lausanne's Digital Signal Processing: out of interest for the subject, but also because EPFL's Recteur took a sabbatical from rectorship to examine MOOCs and how to implement them at EPFL - so it might be a topic for the column on online resources I write for the Swiss bi-monthly Spendere Meglio.

In the first few days, heaps of people were asking on the course forum for the captioning of the video lectures, which had been slightly delayed. Strikingly, the EPFL moderator kept replying that Coursera had been apprised of the issue and was dealing with it.  Then one thread was about asking the course organizers to provide translated subtitles. As if the course conveners not only hadn't told the participants about the Amara Coursera team, but didn't know about it themselves.

Then the captions arrived: good autocaptions, but autocaptions all the same,  in dire need of human revision, especially for this DSP course's videos, which contain a lot of calculus expressions: when such an expression is arbitrarily split between two subtitles, or when "omega" is transcribed as "all my guys",  this is a problem for Deaf and Hard of Hearing students, but also for those who download the video's transcript, which emanates directly from the captions, for studying.

Amara Coursera team

Bot-ruled

So I went to the Amara Coursera team, to see what was happening. Not much: it feels like Sleeping Beauty's castle, with just the
jngiam bot frantically adding videos and spinning captions away like crazy in its garret. Back in December, when I posted about the Coursera bot on the Amara help forum, it was stanford-bot who did those things, uploading autocaptions "Generated by Cogi" and when there already were human ones, deleting them, their revisions, and any translations made from them. stanford-bot seems to have (been?) retired at Xmas. There's no upload indication for jingjam's autocaptions. And occasionally, a non member human ventures into Sleeping Beauty's castle, fixes or translates a set of captions and leaves.

Opaque "projects"

Then there are the projects, meant to correspond to the Coursera courses, though there are 138 projects while Coursera boasts of offering "300+" courses (2). Anyway, projects titles are cryptic, not searchable, nor listed alphabetically and don't have a description anymore. So 138 is quite enough to comb through in order to find the one for a given course's videos.  I eventually fished the one for the Digital Signal Processing.

I applied to join the team, and got no feedback: the jingiam bot is team owner (which adds to the uncanny feeling of this team), but apparently wasn't programmed to process applications.  So I did like the other outsider humans (who may also have pending applications): I started revising the DSP videos without joining. And fortunately, after I posted about editing captions on the course's forum, another participant, more math-literate than me, is now re-reviewing them after me.

No info given though it exists

Strikingly, the Amara Coursera team itself does not have any description either - let alone a linked instruction's page, as some other teams have: TED's first and foremost.

And that's all the more bizarre as Coursera actually has a very informative wiki page on how to use the Amara team for editing the bot-produced captions. How the heck Coursera considered restricting  reading access to this wiki to Coursera participants as consistent with its being a MOOC provider beats me,  but here's the link: https://share.coursera.org/wiki/index.php/Video_Subtitles: just sign up for any Coursera course to create a Coursera ID that will enable you to view it.
The 4 Finer Points section is particularly useful, because it is written by actual volunteer subtitler, and thus deals with very important concrete issues, like updating captions by upload messing up their former revisions and their translations: reported by the author as early as March 22, 2012, on the then Universal Subtitles help forum, to be told that it was "normal". As the present Amara help forum shows, 11 months on, this remains an issue threatening subtitlers' work on all Amara videos.

Suggestions

The videos of Coursera courses must have original captions that can be used by humans who need and/or want to translate them. The only way to achieve that is to have other humans revise them.

But to get humans to do that, Coursera and Amara must be open and transparent about the whole process, including about the way the autocaptions are generated (3). True, these autocaptions are unfit for use by people who need them, but they are good enough to be easily edited - provided people know they are dealing with autocaptions.
https://share.coursera.org/wiki/index.php/Video_Subtitles is a good starting point for sharing this info, though it should a) be made publicly viewable; b) be completed. For instance, the syncing between Amara video pages and the Coursera videos, srt files and transcripts should be explained (otherwise participants who download the transcripts might not have the last version).
This information should be shared with all teachers providing Coursera courses, and also added to each course's pages, with a direct link to the relevant Amara team's "project" containing each course's videos. And it should be added to an Amara Coursera team's description (to be created).

Then humans should regain active control of the Amara Coursera team: a team owned by a bot, who is the only active participant in it,  is utterly unappealing - and unserious.

Then an interactive venue should be created for discussing caption editing: Amara's top-down-only messaging is particularly unfit in this case, even if each course convener were made team admin. But Coursera could automatically create a board for that in each course's forum, or incite course conveners to do so, either there, or externally.

Coursera and Amara  must live up to their claims about quality interational subtitling of Coursera's videos, lest these claims boomerang against them. The Amara Coursera team is linked to in the Amara Enterprise Offerings / Pro Services (4) page. But the team's  landslide of unedited, unusable and hence untranslatable autocaptions kind of jars with the quality boasts of that page.

Best,

Claude


(1) Coursera's newsletter dated Feb 22, 2013.
(2) Ibid.
(3) The revisions of https://share.coursera.org/wiki/index.php/Video_Subtitles show that the automatic nature of the stanford-bot's captions was first denied: who knows how many volunteers were put off by this counter-factual denial?
(4) The title tag says "Enterprise Offerings | Amara - Caption, translate, subtitle, and transcribe videos." but the top menu link says "Pro Services" - which is it?

Claude Almansi

unread,
Feb 23, 2013, 6:08:37 PM2/23/13
to universal-subt...@googlegroups.com
Partial correction to the message I posted yesterday, re:

(...) So I went to the Amara Coursera team, to see what was happening. Not much: it feels like Sleeping Beauty's castle, with just the jngiam bot frantically adding videos and spinning captions away like crazy in its garret. (...)

Someone pointed out to me that the jngiam ID might initially have been used by a human being, and the first of the ca 7'400 activities listed in its/his profile page since March 7, 2012 (almost 2'200 since the beginning of this month), seem to confirm this hypothesis. However, as this ID has now been snatched by or ceded to the bot, this erstwhile human ownership is only of academic  interest: the fact remains that the Coursera team is presently bot-owned, and that's a severe anomaly. And the fact remains that Deaf and Hard of Hearing Coursera students are being given substandard access to course videos.

Also: how come this jngiam ID can add captions without setting a language for the videos, whereas we human volunteers have to, even when none of the droplist languages fit?
 

Dawn Jones

unread,
Feb 23, 2013, 6:36:20 PM2/23/13
to universal-subt...@googlegroups.com
This is really depressing to read. I am also surprised and dare I say disappointed that Amara has allowed it to happen given this is supposed to be a pro account. Shouldn't this be monitored? And are Amara going to address this? I completely agree with your last paragraph Claude. It must've taken you some time to so eloquently write your findings. The least you deserve is a response.

Dawn

Claude Almansi

unread,
Feb 24, 2013, 8:56:00 AM2/24/13
to universal-subt...@googlegroups.com
Thank you, Dawn.

Actually, one thing comes out from the page about subtitling in
Coursera's closed wiki and from the Activity pages for the Amara
Coursera team and for each of its videos: the total lack of official
information about the bot and how it works probably played a great
part in human Coursera participants stopping to subtitle. I.e. people
got discouraged because when the bot belatedly uploaded its
autocaptions, they not only destroyed better human captions, but also
messed up or even destroyed subs translated in other languages from
the human captions.

Fortunately, it's not too late: with the vast and continued media
coverage of Coursera, it's probably going to remain there and grow for
a long time. However, it is imperative that all course participants be
informed on how the bot works, how they can edit its captions and then
translate the edited version. And on how to find the videos for their
course: the team's search engine does not always work, so for each
course, participants should get a direct link to the corresponding
project containing their course's videos.
Moreover as the page about subtitling in the Coursera closed wiki
says: "If you choose to contribute to the subtitling process, it will
enhance your own learning of the content, so there's benefit for
subtlitlers, as well.". That argument could be used too.
Useful public resource for preparing this info:
https://github.com/acli/Coursera-subtitles#things-to-watch-out-for-if-you-want-to-work-on-courseras-subtitles,
which reflects the perception of the bot and its effects by an actual
volunteer subtitler.

Moreover, the team owners/admins should wake up and kick the bot out
of the jngiam owner's ID it presently squats.

@ the Coursera team owners/admins: please answer the question in my
former message: how come this jngiam ID can add captions without
setting a language for the videos, whereas we human volunteers have
to, even when none of the droplist languages fit?

Best,

Claude

Alan Kelly

unread,
Sep 19, 2013, 5:15:03 PM9/19/13
to universal-subt...@googlegroups.com
Claude and Dawn and other readers,

It is my informed opinion that the humans who are managing the bots don't really care about accuracy. If those humans (wherever they are) did care, the bots would not be overwriting and/or deleting the human-produced work of creating true and accurate transcription and captioning.

The Amara human-run organization appears not transparent insofar as their procedures and operations. I was an early adopter a few years ago, but quickly learned they were not interesting to work with, for free. If I recall well, one branch was soliciting volunteers for the creation of closed-captions for Netflix videos. 

Positively yours in truth and accuracy

Alan Kelly
VerbatimIT

Claude Almansi

unread,
Sep 21, 2013, 9:31:43 AM9/21/13
to universal-subt...@googlegroups.com
Hi Alan,

Thank you for your reply. Some updates about what happened since the
beginning of this thread back in February:

Coursera staff stopped adding videos to their Amara team at the end of
February, and deleted the team at the end of March, announcing a new
internationalizing tool to come. They eventually described it in
http://blog.coursera.org/post/50452652317/coursera-partnering-with-top-global-organizations
on May 14: translated subtitles to be outsourced to a few academic
institutions, in a few languages: "Russian, Portuguese, Turkish,
Japanese, Ukrainian, Kazakh, and Arabic" (and later on, Chinese - but
e.g. no Spanish or French). These translated subtitles were meant to
be implemented by September, but they don't seem to have been.
Coursera staff still seem in denial of the need for accurate captions
in the video's language, both for deaf users and for translators. They
also planned to have the translations done via Transifex: while it's
possible to translate a subtitle file there (as well as with any
collaborative writing tool), Transifex does not offer the possibility
to check the translated subtitles in a player.

The description of Amara's services in
http://about.amara.org/enterprise/ is now clearer: free tool, paying
creation of a team of unpaid volunteers, paying subtitling by paid
subtitlers, with more info for the latter in
http://about.amara.org/order-subtitles/ .

Even if Coursera and Amara have now parted ways, they still have
things in common:

1) They both seem to strongly under-assess the work needed by non
professionals to subtitle a long video: up to one hour for some
Coursera lectures, ca an hour and half for most "paid subtitling"
Amara videos, where the subtitling must moreover conform to complex
Hollywood's diktats for cinema / TV . You're a pro, Alan, but for a
non pro, it's much harder to subtitle one 90-minute video than five
20-minute ones, especially if collaboration - perhaps the main feature
that made Amara's initial success when it was called Universal
subtitles - is excluded, as in the paid option.

2) For the original subs from which translations are meant to be done,
they both use a transcript produced by voice recognition with some
human editing, without announcing it. As the human brain favors what
it sees over what it hears, if subtitlers are not warned, they are
less likely to spot voice recognition mistakes. Moreover, these
transcripts are already sliced to fit Hollywood's diktat about
subtitle line length when they get uploaded, and this makes editing
them more complicated and discouraging.

So re Coursera: let them go on sabotaging subtitling on their own:
they have amply demonstrated that "internationalization" is only a
sales argument for them, but that they don't care a hoot, and less
than that for accessibility.

However re Amara, it'd be great if they'd stick by what is still
written in the website:
"Help organizations and deaf and hard of hearing viewers make videos
accessible around the world." (Home page)
"Amara gives individuals, communities, and larger organizations the
power to overcome accessibility and language barriers for online
video." (About page).
"for online videos", i.e. not according to Hollywood's byzantine
ukases concerning subtitling for cinemas and TV, with their artificial
creation of "SDH subtitling" only for deaf people who understand the
original language, line length limitation etc. that complicate the
work of non pro subtitlers.
If video producers want all the Hollywood frills, then they should
budget for pro subtitling, instead of trying to get away with their
legal subtitling obligations "for a fraction of what [they] may be
paying today (often 50% less)." (Order Subtitles page).

Best,

Claude
Reply all
Reply to author
Forward
0 new messages