server side solution for automated text captions

240 views
Skip to first unread message

Reimar Bauer

unread,
Jan 6, 2021, 3:06:56 AM1/6/21
to bigblueb...@googlegroups.com
Hi there,

we got a request for a barrier-free meeting.

Some participants need everything spoken as text.
I know that I can use on my own the diktate function and have
everything spoken (voice2text)
pushed as a text caption on the slides.

I would prefer to have this server side at least for the presenter
automatically.

Has someone an idea/solution how to do this?


regards
Reimar

Reimar Bauer

unread,
Jan 7, 2021, 2:48:03 AM1/7/21
to bigblueb...@googlegroups.com
googled a bit and got some ideas

https://medium.com/sptmru/how-to-create-a-voice-ivr-based-on-freeswitch-sphinx-and-node-js-d143ed44efe3
https://github.com/sptmru/voiceivr

I had not thought about using a dialin user which can connect to a
voice2Text API

Do you know about such a service ?

If something like this is available we may need for a simple usecase
to send the result to a special private chat.

sd...@distancelearning.cloud

unread,
Jan 7, 2021, 1:21:46 PM1/7/21
to bigblueb...@googlegroups.com
Do you want all users audio to be converted to text, or just the presentor.

You can take live audio from freeswitch and send it to google or ibm speech engine and insert the converted text back into a chat channel thru redis.
Technically doable, but needs to be speced as to how best implement.

You can take a users audio, or the mix of all users thru mod_conference in freeswitch.

There is also a project for post processing recordings.
https://github.com/bigbluebutton/deepspeech-web

Can give you some inspirations.

Regards,
Stephen
--
You received this message because you are subscribed to the Google Groups "BigBlueButton-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigbluebutton-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigbluebutton-dev/CADfaKO91KDu7OAtVo7poeGsGQhqObxSEkx_E7NDFykQDumjnvA%40mail.gmail.com.

benjamin milde

unread,
Jan 11, 2021, 1:46:23 PM1/11/21
to bigblueb...@googlegroups.com
Hi there,

I'm a researcher at Hamburg University - several of our MA students worked on improving BBB with various speech recognition features last year. Please have a look at these projects in particular:

BBB Kaldi Connector - https://github.com/3wille/bbb-kaldi-connector  Server side speech recognition for live subtitles in BBB with Kaldi

BBB Related Items - https://gitlab.rrz.uni-hamburg.de/bay1620/wilps-related-items  Client side speech recognition for BBB with Kaldi and lookup of related information (slides, wiki articles). Demonstration video: https://www.youtube.com/watch?v=vvlNGN86CaQ

The advantage of using Kaldi (https://github.com/kaldi-asr/kaldi) is that you can host your own voice2text services or run them locally. Both avoid privacy issues with 3rd party services. Our open source ASR models for German (https://github.com/uhh-lt/kaldi-tuda-de) are quite good already, models for other languages are also developed by other people. Let me know if you would like to collaborate to improve these projects,

Best,

Benjamin




Fred Dixon

unread,
Jan 11, 2021, 2:55:21 PM1/11/21
to BigBlueButton-dev
This is very cool!

It directly addresses one of the areas of BigBlueButton (live speech-to-text) that we (the core developers) haven't had a chance to work on.

For bbb-kaldi-connector, what type of server resources did you need to have available for the live speech-to-text translation in a single meeting or for multiple simultaneous meetings? 

Regards,... Fred




--
BigBlueButton Developer

Like BigBlueButton?  Tweet us at @bigbluebutton

Ali Alhaidary

unread,
Jan 11, 2021, 9:43:21 PM1/11/21
to bigblueb...@googlegroups.com

Excellent work.

Yes, very much interested for Arabic, I would appreciate more details ...

Ali

benjamin milde

unread,
Jan 12, 2021, 11:28:18 AM1/12/21
to bigblueb...@googlegroups.com
Hi Fred,

kaldi-bbb-connector is just the connector, kaldi-model-server/pykaldi is doing the actual ASR (https://github.com/uhh-lt/kaldi-model-server). In terms of resources it needs about 1 CPU core per decoder stream, about 4gb of memory with our model and default settings per instance. We had it running in real time on somewhat modest CPUs from 4-5 years ago. GPUs are only needed for training the models. That said, there is always the possibility to tune various parameters to trade accuracy vs. speed or accuracy vs. memory size. So the hardware requirements can be tuned to what makes sense for a particular deployment.

While you can have one decoder stream for every participant, it doesn't scale that well for larger meetings. Having one decoding stream per meeting and just using the multiplexed signal is probably the way to go for the server side ASR. We are thinking of ways to make the decoder and ivector speaker adaptation aware of speaker changes from BBB. This would help the model to adapt towards n speakers concurrently if done correctly and not just have one speaker embedding where all meeting participants are mixed together.

We also have some issues with connection stability between bbb-kaldi-connector and BBB, also the client side ASR has noticeably better recognition accuracy than the server one (with the signal from FreeSwitch). Would be happy to have a quick chat with you, in BBB of course. :) Ideally, we would also like to make this feature more accessible and easier to install and/or contribute this to the mainline BBB server.

Best,

Benjamin

Reimar Bauer

unread,
Jan 12, 2021, 1:42:20 PM1/12/21
to bigblueb...@googlegroups.com
Hi Benjamin,

This looks almost like the feature we would need. Also an integration
in the mainline BBB server would be a great idea.
I am fascinated by the many good ideas from the universities involved
in this project.

best regards
Reimar
> --
> You received this message because you are subscribed to the Google Groups "BigBlueButton-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bigbluebutton-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bigbluebutton-dev/CA%2ByVNSPmJTUaHQ%3D%3DEez5bS-0E3EXThe1KQQOGZOV8owKHENSdg%40mail.gmail.com.

sd...@distancelearning.cloud

unread,
Jan 12, 2021, 2:03:56 PM1/12/21
to bigblueb...@googlegroups.com
Hi Benjamin, great work..

My question is the accuracy of the server side models for English language.

Have you benchmarked accuracy against google/amazon/ibm cloud services?

Assume the models may get better over time.

Regards,
Stephen

-----Original Message-----
From: bigblueb...@googlegroups.com <bigblueb...@googlegroups.com> On Behalf Of Reimar Bauer
To view this discussion on the web visit https://groups.google.com/d/msgid/bigbluebutton-dev/CADfaKO8_3voJRtUvJ36AT1AWQHoTpFRy%2BEKL7PYQRor%3D6h9%2BNQ%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages