Hi all,
I'm a PhD candidate in ethnomusicology at UC Berkeley, and I have a research question with which I'm hoping you can help. My research focuses on sound communication between Taiwan and China, and especially how sound gets around censorship mechanisms. I'm trying to understand whether there are any technological reasons why audio communication might be more difficult to censor than visual communication. Since AI is central in many censorship tools, I am especially interested in the unique challenges of using sound data with AI.
My understanding thus far is that there were certain developments around 2010 which prompted the use of GPUs for AI, and led to huge breakthroughs in AI applications in industry. My question is whether the switch to GPUs also led to a greater focus on visual data because the physical architecture of the GPUs lend themselves better to visual rather than audio data. Thoughts on this topic? Is there visual bias in AI research? If so, is this bias technological, or cultural? What are some of the unique challenges of using AI technologies with sound data?
Looking forward to hearing your candid feedback, and thanks to Justin Salamon for pointing me toward this listserv.
Best,
Sarah
--
Open-access journal Transactions of ISMIR, open for submissions: https://tismir.ismir.net
---
ISMIR 2021 will take place online, November 8-12, 2021
ISMIR 2022 will take place in Bangalore, India
ISMIR Home -- http://www.ismir.net/
---
Please note! This list is lightly moderated, any email sent from a non-member address will be queued until it can be reviewed by a human. Be sure to join before posting!
---
You received this message because you are subscribed to the Google Groups "Community Announcements" group.
To unsubscribe from this group and stop receiving emails from it, send an email to community+...@ismir.net.
To view this discussion on the web visit https://groups.google.com/a/ismir.net/d/msgid/community/238453a3-6d10-4128-b17b-cf85b7b659a2n%40ismir.net.
On 17 Nov 2021, at 19:00, Sarah Plovnick <sarah.p...@gmail.com> wrote:
2 quick ideas to prompt discussion:1) One natural way to represent audio data is the spectrogram, which of course is a 2D "picture" of audio over time. Some of the same ML techniques used in image analysis (Convolutional Neural Nets, etc) can actually be applied to spectrograms. So progress in vision understanding helps audio understanding as well. For instance, see the results of the DCASE audio understanding competition; I think you'll find several CNN-type systems there.
To view this discussion on the web visit https://groups.google.com/a/ismir.net/d/msgid/community/CANHJUv3hSC2d0aFJXp8z5oQ7G0t5ffWYA8FeXanvsg-V8MFbdw%40mail.gmail.com.