Thanks, Florian. I looked at the apply implementation and you're correct that it uses the storage. (
Here, if anyone's curious.) I think you do get a speed up by using the storage because the storage has not concept of bounds checking. Anyway, I suppose what I wanted wasn't necessary since apply directly uses a lua for loop anyway. Fortunately, the for loop I wanted to avoid was not the limiting factor in my algorithm, as I have learned.
@Riddhiman: The best way to demonstrate is an example. Consider the forward propagation of a linear layer of a neural network. Your code would like this in torch:
local n = 128 -- batch size
local f1 = 25 -- input feature size
local f2 = 64 -- output (hidden) feature size
local X = torch.Tensor(f1, n) -- input features
local W = torch.Tensor(f1, f2) -- weights
local b = torch.Tensor(f2) -- bias
-- initialize the above three tensors
-- ...
-- Here, you would use bsxfun in matlab to broadcast the addition of b
-- over the second dimension. (the batch dimension)
local output = W:t() * X + torch.expand(b:resize(f2, 1), n)