BachBot: Polyphonic model for Bach-like chorales

Feynman Liang

unread,

Aug 1, 2016, 11:48:28 AM8/1/16

to Magenta Discuss

I built a polyphonic LSTM model and trained it on Bach chorales. You can check it out at http://bachbot.com/. I'd love to hear what everyone here thinks about it, and whether there's interest in a port to the Magenta framework.

Bob Sturm

unread,

Aug 2, 2016, 12:39:22 PM8/2/16

to Magenta Discuss

Hi Feynman. Interesting. I am sure you know of Tom Collins' work:
http://www.tomcollinsresearch.net/

Why do you synthesize the chorals at such high tempo? The piano
samples and constant velocity make the voice leading really hard to
hear. If they were played by a human (sung by humans), it would be a
lot easier to tell which is Bach.

Why are you performing a discriminate test? You probably know Ariza's
article: C. Ariza, “The interrogator as critic: The Turing test and
the evaluation of generative music systems,” Computer Music J., vol.
33, no. 2, pp. 48–70, 2009.

Message has been deleted

Feynman Liang

unread,

Aug 2, 2016, 2:07:00 PM8/2/16

to Bob Sturm, Magenta Discuss

Resending this to include magenta-discuss:

Hi Bob,

The fermatas are being modeled (see https://drive.google.com/open?id=0B9RN8w94wuqIWUNidnRLR3RkOWM). When I trained models without accounting for them, I got unrealistically long phrases before resolution back to the tonic.

I think the problem is that the synthesizer (in my case, the Linux program `mscore` from MuseScore) does not account for them. Any suggestions for alternatives?

Feynman

On Tue, Aug 2, 2016 at 5:38 PM Bob Sturm <bobl...@gmail.com> wrote:

Hi Feynman,

> Thank you for the references: I wasn't aware of either Tom Collins' or
> Ariza's work. They are very relevant and I will be referencing them as I
> write up my thesis.
>
> I didn't put much thought into the tempo and used the default 120BPM
> provided by MuseScore (my MIDI -> MP3 synthesis program). Is there a choice
> of tempo and instrument (aka "SoundFont" in MIDI synthesis) which you think
> would be more realistic?

Listen to this Bach choral.

video: https://www.youtube.com/watch?v=ZtTaBw5jvm8
notation: https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwiKlvW-k6POAhULJMAKHfc0CMIQFggkMAE&url=http%3A%2F%2Fsporadic.stanford.edu%2FChorales%2Fpdf%2FChorale268.pdf&usg=AFQjCNEbPrXHv17kNNo9PBToXkGxm2qplA&sig2=lQk43oVUJUnGHKvEnuk2LA

The fermatas are necessary. It is not a choice between sound, but
voicing. One can play these on piano, organ, singing, etc. But if the
voicing is not balanced, then it will sound hollow.

> We use a discriminative test on BachBot.com because:
>
> The target audience (i.e. anyone on the internet with a link) has a diverse
> range of backgrounds/skill levels and may not be able to provide detailed
> analysis beyond a simple Turing test
> The amount of data generated by an online experiment is too large to
> manually analyze, prohibiting freeform text feedback

I see. This is not a Turing test you are doing, but a musical
discrimination toy test.

> We are also conducting more detailed listener studies within Cambridge's
> Music department.

OK. That will be more interesting!

-Bob.

>
> Again, thank you so much for your interest and feedback!
>
> Feynman

>
>
> On Mon, Aug 1, 2016 at 5:17 PM Bob Sturm <bobl...@gmail.com> wrote:
>>
>> Hi Feynman. Interesting. I am sure you know of Tom Collins' work:
>> http://www.tomcollinsresearch.net/
>>
>> Why do you synthesize the chorals at such high tempo? The piano
>> samples and constant velocity make the voice leading really hard to
>> hear. If they were played by a human (sung by humans), it would be a
>> lot easier to tell which is Bach.
>>
>> Why are you performing a discriminate test? You probably know Ariza's
>> article: C. Ariza, “The interrogator as critic: The Turing test and
>> the evaluation of generative music systems,” Computer Music J., vol.
>> 33, no. 2, pp. 48–70, 2009.
>>

>> Cheers.
>> -Bob.

>> > --
>> > Magenta project: magenta.tensorflow.org
>> > To post to this group, send email to magenta...@tensorflow.org
>> > To unsubscribe from this group, send email to
>> > magenta-discu...@tensorflow.org
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Magenta Discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to magenta-discu...@tensorflow.org.
>>
>>
>>
>> --
>> Bob L. Sturm, Lecturer in Digital Media
>> School of Electronic Engineering and Computer Science
>> Queen Mary University of London
>> https://highnoongmt.wordpress.com

--
Bob L. Sturm, Lecturer in Digital Media
School of Electronic Engineering and Computer Science
Queen Mary University of London
https://highnoongmt.wordpress.com

Message has been deleted

Douglas Eck

unread,

Aug 2, 2016, 2:29:42 PM8/2/16

to Feynman Liang, Bob Sturm, Magenta Discuss

Hi Feynman,

Nice generation! I don't suppose you'd want to move this code into TensorFlow / Magenta? :-)

Douglas Eck |

Sr. Staff Research Scientist |

de...@google.com |

650-336-8433

Feynman Liang

unread,

Aug 2, 2016, 7:00:46 PM8/2/16

to Douglas Eck, Bob Sturm, Magenta Discuss

Doug,

I'd be happy to once I hand in my thesis on August 12th. You're cited quite a few times in there ;)

Douglas Eck

unread,

Aug 2, 2016, 7:36:08 PM8/2/16

to Feynman Liang, Bob Sturm, Magenta Discuss

Aug 12th! Stay healthy and get sleep :-)

Tom Collins

unread,

Aug 3, 2016, 1:03:42 PM8/3/16

to Magenta Discuss

Hi Feynman and Bob,

Thanks for the mention! Collaborators and I have been working on automatic generation of stylistic compositions since 2010, mainly Chopin mazurkas, Bach chorales, and Pop ballads (a function of availability of reliable representations and time, rather than preference!). The algorithm is called Racchmaninof (RAndom Constrained CHain of MArkovian Nodes with INheritance Of Form).

We have an article accepted for the inaugural issue of Journal of Creative Music Systems (http://jcms.org.uk/), called

Computer-generated stylistic compositions with long-term repetitive and phrasal structure.

The journal is a little behind their intended publication date of May, but it should be out later this year I think. For generating all four parts of a Bach hymn (one of two target styles evaluated as part of a listening study, the other was Chopin mazurkas), we found that only five out of 25 participants performed significantly better than chance at distinguishing our algorithm's output from original human compositions. These participants had a mean of 8.56 years of formal musical training and mode ‘daily/weekly’ regularity of playing an instrument or singing.

In other words, generating stylistically successful Bach chorales is more closed (by us) than open a problem, although there are caveats (see below), and big challenges still remain for other Bach composition types, as well as for other composers/styles and application/exploitation opportunities. (And it's still interesting to address a closed problem with a new approach.)

The original study has finished (no data will be collected at the present time and links like submit may be broken), but those curious are welcome to browse the stimuli here

http://tomcollinsresearch.net/participantStudies/RACjan16/bach.html

http://tomcollinsresearch.net/participantStudies/RACjan16/chopin.html

There is a mix of original and computer-generated stimuli here, so email me off-list if you'd like the "solutions".

I'll be at ISMIR and satellite events from tomorrow onwards. Magenta folks/anybody else is welcome to chat to me about this. The source code is and has been made available to other academics, in the interests of replicability and pursuing subsequent research questions. I'm happy to discuss access options for non-academic use, but am reluctant to give away the source code for free while I'm still trying to raise adequate funding for continuing our research in this area.

A few more thoughts are below.

All best,

Tom

To offer a position on Bob's question to Feynman, "why are you doing a discriminate test?", we typically collect ratings of stylistic success and free-text responses, in addition to discriminate answers. If you only collect discriminate answers, you can't tell if somebody is right but for the wrong reasons. I like to include discriminate answers because it's a motivating factor for participants: if they only have to rate stylistic success and write a few comments, the task matters less somehow.

In our recent study, while participants couldn't in general discriminate explicitly between original and computer-generated Bach, the difference in stylistic success ratings is significant. My collaborators and I use the framework of Amabile's Consensual Assessment Technique to conduct our listening studies (Collins et al., 2016; Pearce & Wiggins, 2007; Amabile, 1996). Although our algorithm Racchmaninof was designed primarily for generating Chopin mazurkas, the Chopin results remain much worse.

With regards Magenta, recently I saw talk of posting computer-generated excerpts on social media sites and seeing if people like them as a method of evaluation. There is some real-world validity to this, but people's responses still require well grounded (in the literature), thorough evaluation, we need to know about listeners' levels of musical expertise, and the findings need to be replicable. I would like to see as robust an approach taken to evaluating Magenta as was applied to Google's Go-playing algorithm. Just because Magenta is generating art/music doesn't mean the underlying system can't be evaluated in a rigorous, scientific fashion.

Refs

Amabile, Teresa M. (1996). Creativity in context. Boulder, Colorado: Westview Press.

Collins, Tom, Laney, Robin, Willis, Alistair and Garthwaite, Paul H. Developing and evaluating computational models of musical style. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 30(1):16-43, 2016. (Accepted manuscript available from http://tomcollinsresearch.net/pdf/collinsEtAlAIEDAM2016.pdf)

Pearce, Marcus T., and Wiggins, Geraint A. (2007). Evaluating cognitive models of musical composition. Proc. Int. Joint Workshop on Computational Creativity (Cardoso, A., & Wiggins, G.A., Eds.), pp. 73-80, London: Goldsmiths, University of London.

Tom Collins, PhD

www.tomcollinsresearch.net

www.musicintelligence.co

Visiting Assistant Professor

Department of Psychology

Lehigh University

Douglas Eck

unread,

Aug 3, 2016, 1:46:15 PM8/3/16

to Tom Collins, Magenta Discuss

Hi Tom,

Thanks for the great email.

>Racchmaninof (RAndom Constrained CHain of MArkovian Nodes with INheritance Of Form).

:-)

This is an even better model name than Mike Mozer's excellent CONCERT attempt ("Connectionist Composer of Erudite Tunes"; NIPS 1991).

> recently I saw talk of posting computer-generated excerpts on social media sites and seeing if people like them as a method of evaluation

I'm not sure which article (or discussion list posting) you saw. If it was an article, it was likely influenced by the Evaluation section in my initial blog posting. The point I was trying to make is that we don't have reliable *numeric* measures of goodness for generative models. Measures like log-likelihood, perplexity, etc don't go far in telling us what's good generated art vs bad generated art. I also said in some interviews that I don't want to hold computer generated music to lower or higher standards than musicians. If we can develop algorithms that make music people enjoy, and engage with over long periods of time, I'd be very pleased. (We're far from that I think...)

At the same time, I agree that rigorous, scientific evaluation is warranted of any research. Small-N (or ideally large-N :-) controlled studies of listeners make a lot of sense. So does expert evaluation. As Magenta goes in these directions, it will certainly be in collaboration with the academic community. Let's write some papers!

In parallel, I remain committed to developing ways to improve media generation by learning from feedback provided by co-creators (artists and musicians) and consumers (listeners, readers, viewers). Learning in domains with little or no ground truth is one of the biggest challenges, if not the biggest challenge, we face in machine learning.

Finally I think a lot of this depends on the problem being solved. If the goal is to generate convincing pieces of a musical style, then Turing-like tests and questions about successful capture of style make sense. Your work in this area is great. I'd love to see an open-source framework for doing these sorts of tests easily. Have you considered open-sourcing your framework?

It's less clear to me how to do this in domains where machines are not trying to recreate existing styles. For example, we might want to generate a jogging soundtrack that doesn't even really try to be music. (I don't want to be challenged musically when I jog. I want something in my ears that drives me to run longer and faster). Or we might want to create soundscapes that adapt to what's happening visually in VR. I think in these domains, direct measures of effectiveness and engagement deserve focus. That doesn't mean we shouldn't do questionnaires... just that it's less clear to me what we learn from them.

--

Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Feynman Liang

unread,

Aug 3, 2016, 2:14:40 PM8/3/16

to Douglas Eck, Tom Collins, Magenta Discuss

Doug,

I've open sourced the framework for large scale evaluation that's running on BachBot.com:

https://github.com/feynmanliang/subjective-evaluation-client
https://github.com/feynmanliang/subjective-evaluation-server

It's currently tied to azure but should be pretty straightforward to generalize (it just writes a bunch of JSON which gets processed in a Hadoop job).

Would be happy to clean it up if there's interest (after 8/12 that is)

Feynman

Douglas Eck

unread,

Aug 3, 2016, 2:36:37 PM8/3/16

to Magenta Discuss, de...@google.com, tomthe...@gmail.com

After 8/12 indeed ;-)

Ok this is really great. Thanks!

On Wednesday, August 3, 2016 at 11:14:40 AM UTC-7, Feynman Liang wrote:

Doug,

I've open sourced the framework for large scale evaluation that's running on BachBot.com:

https://github.com/feynmanliang/subjective-evaluation-client
https://github.com/feynmanliang/subjective-evaluation-server

It's currently tied to azure but should be pretty straightforward to generalize (it just writes a bunch of JSON which gets processed in a Hadoop job).

Would be happy to clean it up if there's interest (after 8/12 that is)

Feynman

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

--
Douglas Eck | Sr. Staff Research Scientist | de...@google.com | 650-336-8433

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Feynman Liang

unread,

Aug 3, 2016, 5:53:09 PM8/3/16

to Tom Collins, Magenta Discuss

Hi Tom,

Thank you for your comments!

I would like to reference your work on Racchmaninof and subjective evaluation in my thesis; is

Collins, Tom, Laney, Robin, Willis, Alistair and Garthwaite, Paul H. Developing and evaluating computational models of musical style. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 30(1):16-43, 2016. (Accepted manuscript available from http://tomcollinsresearch.net/pdf/collinsEtAlAIEDAM2016.pdf)

the correct reference to read further?

Feynman

--

Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Tom Collins

unread,

Aug 5, 2016, 11:04:05 AM8/5/16

to Douglas Eck, Magenta Discuss

Hi Doug,

Thanks for your reply.

I agree with the sentiments expressed at https://magenta.tensorflow.org/welcome-to-magenta about the inadequacy of likelihood or related metrics for evaluation of computer-generated art.

The comments I'm referring to are on the MIR list

"Is it good music? The only way to know is to put it out there. It's good music if people listen to it a lot."

and the blog

"we need to get Magenta tools in the hands of artists and musicians, and Magenta media in front of viewers and listeners."

Having seen so much work in the area of auto-generated music that (i) isn't evaluated at all, (ii) is evaluated by the paper's authors ("it can be said that the musical quality is high, compared to previous approaches", Pachet & Roy, 2014, p.7), or (iii) is evaluated by a likelihood metric alone without any acknowledgment that this is at best a poor proxy, I am eager for researchers in this area to reconnect with some of the more thorough evaluation methodologies set out by Pearce & Wiggins (2007) and Pearce, Meredith, and Wiggins (2002).

I also agree that the output of an algorithm intended for stylistic composition is easier to evaluate than that of an algorithm intended for use cases such as the jogging soundtrack you describe. Harder again is evaluating free composition (composition without any specific stylistic aims or reference points). Harder still is evaluating the use of Magenta (or whatever else) by artists and musicians in a creative/iterative/responsive mode. Peter Knees, Christian Coulon, and I will have a late-breaking/demo paper on an initial attempt to do this:

https://github.com/ismir-net/ismir2016-lbd/blob/master/submissions/collins-mir.pdf

We got time-lapse composition edits by virtue of a browser-based piano-roll interface, which enabled us to quantify basics such as percentage of edits that were requests for auto-suggestions, percentage of suggestions undone, correlation of overall user enjoyment with percentage of suggestions undone, etc.

With regards the framework that I've put together for my listening tests over the years, I'm happy to open-source it. Because of my career stage and positioning in academia, it needs to be linked to a citable paper is all, because hiring committees in Music/Psych tend not to have heard of or care about github. If somebody from Magenta wants to help package and write it up, that could be a good collaboration.

See you at ISMIR!

Tom

Refs

Pachet, François, and Roy, Pierre. (2014). Non-conformant harmonization: the Real Book in the style of Take 6. Proceedings of the International Conference on Computational Creativity, pp. 100-107. Ljubljana, Slovenia.

Pearce, Marcus T., Meredith, David, and Wiggins, Geraint A. (2002). Motivations and methodologies for automation of the compositional process. Musicae Scientiae, 6(2):119-147.

Pearce, Marcus T., and Wiggins, Geraint A. (2007). Evaluating cognitive models of musical composition. Proc. Int. Joint Workshop on Computational Creativity (Cardoso, A., & Wiggins, G.A., Eds.), pp. 73-80, London: Goldsmiths, University of London.

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

joerg wichard

unread,

Aug 25, 2016, 12:42:58 PM8/25/16

to Magenta Discuss

Hi Feynman,
great work, congrats.
I'm really courious to read your thesis.
I re-entered the field recently and I'm really fascinated by the progress.
When I wrote my thesis (some days ago), there was I time series prediction challenge
from the Santa Fe Institute. The final task (at that time more or less science fiction) was to predict
the end of Bach's last Fugue:
http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html
Regards
Joerg

Bruno Wu

unread,

Aug 25, 2016, 2:04:34 PM8/25/16

to Magenta Discuss

Great work Feyman!

--

Reply all

Reply to author

Forward