Hello Francisco,
One way to think about it is this: for a given matrix X, a model for the problem you're trying to solve is that you want to minimize:
f(X) = \sum_{i,j \in S} (X_{ij} - Y_{ij})^2
where S is a certain set of positions is the matrix. We can rewrite this as follows: define M to be a sparse, binary matrix, with M_{ij} = 1 if i, j \in S, and M_{ij} = 0 otherwise. Then, we can write
f(X) = || (X - Y) .* M ||_F^2
so, it's a squared Frobenius norm, except we've "masked" the difference X-Y before taking that norm. We can write this as
f(X) = < (X - Y).*M, (X - Y).*M >
where <A, B> = trace(A'*B). This is easy to differentiate:
Df(X)[Z] = 2 < Z .* M, (X - Y) .* M >
and within the inner product, we can move an entry-wise product to the other side:
Df(X)[Z] = 2 < Z, (X - Y) .* M .* M >
This is true for all directions Z, hence
grad f(X) = 2 (X - Y) .* M .* M
(Notice that, here, M .* M = M, so we can simplify a bit).
In other words: the gradient of the masked frobenius norm is just the usual gradient, masked.
Hope this helps,
Nicolas