Hi Steve, cc'ing Krishn has he was TA for me last year and may remember some additional things. And obviously Randy can chime in.
But two points
1. it is possible the threshold for counting a trial as accurate is too liberal - you can simply go to the srn.go code and change
sse, avgsse := out.MSE(0.5)
to
sse, avgsse := out.MSE(0.25)
or so and then it will require the correct unit to be within .25 accuracy. That should make it clear the network isn't learning in that case without a hidden or SRN layer.
2. ANother thing we noticed last year is that the network can solve a first order sequence without a context layer - not as reliably but still it could which was weird. It turned out that this was because in the newer versions even though there is no context, the Ge values have some hysteresis from one trial to the next. So in the very first cycle of trial 2, regardless of the input, the Ge values in the neurons are biased to be higher for those that had highest Ge at end of last trial. This is presumably related to GTau which probably doesn't get reset even if activations do across trials. As such the network can capitalize on this trace bias to have some memory.
I don't remember if we fixed that or just explained it to the students as to why there is still some success (it effectively acts like a context layer in its Ge memory). So this is a heads up to look for, and maybe could be addressed by changing those Ge params.