Ok, thanks again, Jan! This helps a lot.
But in the first post you said "the whole point of having different
sampling rates is to the reduce the problem complexity" -- that's why I
assumed you want to process at the lower rate in the end
Yeah, I wasn't being very clear in my writing: I'm using sequences which are quite long in duration (e.g. 4000 1-second time steps), and I have a handful of variables/features (7 - 9). For some of these features, the full sample rate is needed because there are fast dynamics that have a large immediate impact on the output as well as slower changes that can have an impact over long timescales. For others, only the slower changes matter, so heavily downsampling them is fine. The full input space is large enough that I start running into difficulties converging, as well as having some prohibitive restrictions on batch size and network size (running out of memory). As part of pre-processing the data, reducing the input space by downsampling each feature by the same amount (e.g. the first 2000 time step values sampled at a low rate, the next 1000 sampled at a higher rate, then the last 1000 sampled at the full rate) helped quite a lot with this. But, because I know I can get away with many fewer samples from a few of the features in particular, I'd like to be able to cut those out. So, I guess really what I meant is that I want to reduce the overall size of the input by cutting out some of the information I know I can throw away for several of the features, just to help make the problem simpler for the network to learn and also to help fit larger batches into training.
With the Upscale1DLayer at some point I have the full sequence size again for all features, so it seems I might not avoid the memory issues that way unless I'm able to use a smaller size/number of layers by processing the two sets of data separately? I'll give it a shot regardless.
Cheers, and thanks again,
Lee