Hi Eirinn,
First of all, welcome to the forums! I'm happy you're willing to give Bonsai a try and that you're providing this feedback. It is very useful to help us decide how to improve the system.
Indeed, presenting information about possible inputs is something that has been debated about since the very first version. The reason why this is difficult in general is that Bonsai nodes can have extreme flexibility in which kinds of inputs they can take and this can be in extreme cases even determined dynamically (i.e. its not fixed).
However, it's true most of them can only handle a specified subset of input types; otherwise they usually accept any type. At least for these nodes (which is the vast majority) it would be useful to have this information displayed to the user. Of course, a more comprehensive reference manual about the behavior of each node would help immensely in this as well, which is also something that has been in the plans for quite a while...
Anyway, regarding the problem at hand, I think I can provide a few pointers. Sorry for the long answer but as I hope you will see there is quite a lot of depth to Bonsai that is not immediately apparent, and this is a good example for exploring these questions. Let me know if anything is not obvious. In the end if things are not clear I can just post an example workflow that works and we can keep discussing as needed.
Ok, starting with the audio functions. As you've correctly inferred, the AudioPlayback node takes in an OpenCV Mat, which is the datatype chosen to represent any digital time series in Bonsai. The best way to try out this node is to either feed in the output of the microphone directly like this:
which essentially creates a feedback "echo" system; or you can use FunctionGenerator to generate synthetically the waveforms:
in this case you may have to set the Frequency parameter to 10 and the Scale to 100 or 1000 before you can generate anything audible.
In the latest version you can also use AudioReader to load in a WAV file but I'll leave it as an exercise ;-)
Now the question would be how to trigger this playback from an arbitrary event, like your ROI activation. The trick here will be to switch playback on and off. There are essentially two ways to do this, depending on what you want to achieve. One way is to simply "gate" the data coming in from FunctionGenerator. In this way, the source is always generating data, but you either allow it to go through to the AudioPlayback or not depending on a boolean value. I will use a KeyDown node where I test for a specific key to simulate a source of boolean values, but you can replace this by your ROI test. This would look something like the following:
Breaking this down, you have two main branches: one for generating the audio signal (FunctionGenerator + ConvertScale), another to compute the toggling signal (KeyDown + Equal).
First you need to somehow correlate these two data streams together, as the behavior of your final sound stream now depends on the state of both streams. CombineLatest is a common way to do this: it joins the latest data from both data streams in a single combined state tuple (Item1;Item2). Now we can use this combined state to make decisions. We will use a Condition node to decide whether the sound packet goes through or not.
Condition works as something like an "If" or "Switch-Case" statement in other programming languages. Data will only go down the Condition branch if some test evaluates to True. You can specify the nature of the test by double-clicking the Condition node and changing the nested workflow inside. In this case, we want to specify that the test evaluates to True when the state of the key workflow is True. This state is now inside Item2 of the combined state being output from CombineLatest (you can check this by right-clicking on CombineLatest and inspecting its output). In the end, the workflow inside Condition would look something like this:
Basically it just takes the boolean value from the key workflow and uses that to evaluate the condition. When the state of the key test is True, the data can go through; otherwise it is filtered out (dropped).
In the end, we just select the sound packets (Item1 from the CombineLatest output) and send that to the AudioPlayback. With this workflow, you should have a pure tone being played that either stops or finishes depending on which key you press (you have to pick which key controls this in the Equals node).
Now what happens with this way of doing it is that the sound will always be "playing". Effectively what we designed is a dynamic "mute" button. The potential problem is that this way you have no control over where the sound is exactly when you play it...
If you want the sound to play from the beginning every time you press a key, you have to somehow "start" the sound playback only when the key happens. Imagine for example you had the original FunctionGenerator workflow but now you start and stop it dynamically based on a key press. Whenever something you want to do feels like you need to start/stop workflows on the fly, you probably need to use SelectMany.
SelectMany is a complicated beast to explain succinctly because you can do so much with it. However, one of the best ways to think about it is literally taken from its name. The name actually comes from the Reactive Extensions (Rx) framework, and before that from SQL, a popular language used to query databases. It turns out a good way to think about Bonsai is just as a language that allows you to build graphical queries over "live" databases (i.e. data streams). Each element coming through one of these data streams represents a "row", and it's properties are the "attributes" or "columns" of the database.
In this framework, a "Select" operation is when you take one row (an element) and project (i.e. transform) some of its attributes to generate a result. For example, I can take a MouseMove data stream where rows are XY positions whenever I move my mouse, and I can "Select" a result where I multiply both X and Y values together. You can easily imagine many such operations, and all of Bonsai's DSP and Vision operators are simple "Select" operations in this sense.
Now another interesting possibility is when you want to "SelectMany" rows. What this means is that you take your input element, but now instead of doing some kind of transformation that will generate a new output "row", you want to actually expand it into "many" rows. If you think about your trigger data stream that will play a sound for each event, you are in this situation. For each individual input "row" (or trigger event), you want to generate a sequence of "many" sound packets that will be sent to the AudioPlayback node.
Another (potentially simpler) way to think about it. Imagine you had the following workflow:
This is simply our original FunctionGenerator worflow, but now with the addition of a Take operator that specifies that we will only generate a fixed number of sound packets (e.g. 100). Now if we go back to the scenario of playing a sound whenever a key is pressed, what we really want is to start this workflow when the key is pressed and collect all the generated packets. That's exactly what SelectMany does so it turns out we can reduce this scenario to the following workflow:
Where the workflow that is "played" every time the key is pressed is specified inside the SelectMany node, and is simply:
So again, and summing up, the only thing we are doing is playing the FunctionGenerator workflow whenever the key is pressed, and sending all the generated packets to the AudioPlayback.
I will leave the rest of the problem again as exercise, but do let us know if any issues pop up. The way to think about these problems in Bonsai can be quite compact, but it is often very different from how you would think about it in a traditional programming language so it takes some time getting used to.
Thanks for the feedback. I'm always looking forward to hear about nice new directions we can extend Bonsai to.
Cheers,