Cutting and stacking your data isn't the right way to approach this, most likely. Remember, convolutions find spatial features in data - for images, two neighboring pixels (no matter if one's left/right of the other or above/below it) are related, so a 2D convolution makes sense. If you stacked your data the way you suggested, pixels in the same rows would still be related (as they are now - assuming your data is some kind of signal, for example in time domain), but within columns they wouldn't; so a 2D convolution wouldn't make much sense column-wise.
Fortunately, convolution over 1D data is perfectly legal. There
should also be no technical problem with training a CNN on 1x3000 data.
It can still be considered a 2D mxn image, but a very thin one - with
m=1 and n=3000, so you can create a dataset as you normally would. You cannot, however, use the standard LeNet setup on
this kind of data, because it is designed to deal with 28x28 inputs.
You'd have to write your own network architecture with 1D (specifically:
1xK) kernels.