Week 1 Progress Report

Anirban Banik

ungelesen,

16.05.2018, 05:23:2816.05.18

an beaglebo...@googlegroups.com

Hi everyone. This is my Week 1 Progress report. As the coding period has just started... haven't started with much....but still...made some progress.

Works Done:

1. Worked on a script to keep a check on the accuracy statistics of pocketsphinx....based on the characteristics of the audio input - it's sampling rate...chunk-size...etc. This can in turn help in accurately decoding the speech to text later on.

2. Made a script to record the audio from the microphone...and based on the intervals of silence...it splits the audio. As soon as the audio file is created...it pauses the recording process...and then performs the tasks to be done with the audio input...only to resume the process later.

3. Working on using the script mentioned in 2 in order to launch other processes... especially the games.

Works to be done during next week:

1. Checking audio accuracy statistics by using sophisticated audio as well as noisy audio.

2. Improving the voice_input part of the games already created...that is...Hangman and Spell It!

3. Working on launching the entire process at boot-time.

4. Deploy the entire thing on a pocketbeagle with the hardware. Could not start work with the board...because till now haven't received the complete set of components.

Issues faced:

1. The accuracy of recognition of letters is very poor. Cannot think of any idea to overcome the problem.

Working repository link:

Main working repository:

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell

Previous working repository before commencement of GSoC coding period:

https://github.com/AnirbanBanik1998/Speak_and_Spell

I will merge all the updates in my main working repository only.

Thanks,

Anirban.

Anirban Banik

ungelesen,

21.05.2018, 00:36:5221.05.18

an beaglebo...@googlegroups.com

Hi, This is my progress report of week 1.

Works Done:

1. Worked on a script to keep a check on the accuracy statistics of pocketsphinx....based on the characteristics of the audio input - it's sampling rate...chunk-size...etc. This can in turn help in accurately decoding the speech to text later on.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Speech_Processing/accuracy_check

2. Made a script to record the audio from the microphone...and based on the intervals of silence...it splits the audio. As soon as the audio file is created...it pauses the recording process...and then performs the tasks to be done with the audio input...only to resume the process later. This code has been used in many different forms in many parts of the project...as and when required.

3. Worked on using the script mentioned in 2 in order to launch other processes... especially the games. It is successfully able to launch the games...if the keywords are recognized well.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Game/Game_launcher

4. Had to revamp the previously created Spell It! game. The newly made game takes inputs smoothly and operates on them.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Game/Games/Spell_It!

5. Did the same for the Hangman game too.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Game/Games/Hangman

6. Working on scraping more words to enhance the wordlist.

Issues faced:

1. Had problems with running the recording script parallel to the games...as tried by me previously. Solved the issue by pausing the games in between to run the recording script as an when required.

2. Set up most of the hardware as required to deploy the project...but am having some problems setting the headset...the pocketbeagle won't recognize it.

Works to be done next week:

1. Solving the hardware problems...and checking the progress on the board.

2. Working on the third game...will be done soon.

3. Have to start the 4th game(crossword) from scratch.

4. Some recording scripts have been copied and used in different forms in different directories....want to reduce project space by reusing the same code in other places.

Regards,

Anirban

Anuj Deshpande

ungelesen,

23.05.2018, 15:43:0323.05.18

an beaglebo...@googlegroups.com

Hi Anirban

As per our discussion on IRC, I have used AWS Polly to create sound samples for you to test your work with. There are a around 1500 samples in total (~40 voices saying 36 words).

- The data can be found attached. I didn't actually play all the files in this .zip, but I made sure that they were non-null files and checked a few at random.

- The script I used can be found here. You need aws cli installed and provisioned with access to the polly service. I understand that this might not be possible for you to setup as it's behind a paywall. If you need any more stuff generated please let me know.

- I had second thoughts about using this data as I felt that it might not be wise to use "computer generated" voice samples to test the hit rate of a voice recognition piece. But the data that you have provided here is straight forward - simple words and alphabets. Something I am kind off sure Amazon used a person to generate. Its only the sentences that Amazon stitches together using AI to the best of my knowledge. (Other mentors would love to hear your thoughts on this).

Regards

Anuj

--
https://beagleboard.org/gsoc
---
You received this message because you are subscribed to the Google Groups "BeagleBoard GSoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard-gs...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

samples.zip

Allen antworten

Antwort an Autor

Weiterleiten