Color augmentation (a la Krizhevsky et al)

3,836 views

bawdyb

Jun 23, 2015, 5:59:30 AM6/23/15
Hi guys,

I have been trying to implement the color intensities augmentation (from Krizhevsky et al. 2012). However, the explanations provided in the paper are not clear for me. Could someone please explain to me what the authors mean by : "Specifically, we perform PCA on the set of RGB pixel values throughout the ImageNet training set.". Did they perform PCA over each of the channels? And also why do they only take an 3 by 3 cov-matrix for the pixels?

I saw that Sander applied the same method in his kaggle galaxy competition. But I don't understand how he managed to get the PCA values, it is provided directly in the code as a vector (in realtime_augmentation.py):

colour_channel_weights = np.array([-0.0148366, -0.01253134, -0.01040762], dtype='float32')Many thanks

Sander Dieleman

Jun 23, 2015, 6:13:22 AM6/23/15
Basically, you have to treat every pixel in every image as a data point. This means you have a ton of data points which are vectors with 3 values: R, G and B. You can then compute PCA on these datapoints. This means you have to compute the covariance matrix of these vectors, which is a 3x3 matrix.

PCA will give you 3 vectors with 3 components. You can then sample 3 scale parameters, and add scaled versions of each of these 3 vectors to all pixels in the image. For best results you should also scale them by the corresponding eigenvalues. This will perturb the image colours along these PCA axes.

Note that what I did for the galaxy challenge is not exactly the same: I noticed that one of the PCA vectors had a much larger eigenvalue than the others, so it was clearly dominant. That's why I didn't bother using the other two vectors. I only used the one with the largest eigenvalue, so this was basically equivalent with brightness perturbation instead of colour perturbation.

Sander

bawdyb

Jun 23, 2015, 10:15:26 AM6/23/15
Thanks Sander, I tried to code it as two functions. The compute_PCA is called right after loading the data set and then for each batch I will call the add_color_noise(), is this correct ?

def compute_PCA(image_array):    # Transpose and reshape the original image_array from N x channels x height x width  to  N x height x width x channels    imT = image_array.transpose(0,2,3,1)    reshaped_array = imT.reshape(imT.shape[0]*imT.shape[1]*imT.shape[2],3)    # Get covariance matrix, the eigenvectors and eigenvalues    cov = np.dot(reshaped_array.T, reshaped_array) / reshaped_array.shape[0]    U,S,V = np.linalg.svd(cov)    eigenvalues = np.sqrt(S) # because cov is symmetric and psd    return eigenvalues,Udef add_color_noise(image_array,eigenvalues,U,mu=0,sigma=0.1):    for idx in xrange(image_array.shape[0]):        # Generate the \alpha samples        samples = np.random.normal(mu, sigma, 3)        augmentation = samples * eigenvalues        noise = np.dot(U, augmentation.T)        # Add the noise        z = image_array[idx].transpose(1,2,0) + noise / eigenvalues # Scale here with the corresponding eigenvalue ?        image_array[idx] = z.transpose(2,0,1)

Sander Dieleman

Jun 23, 2015, 10:57:45 AM6/23/15
You multiply with the eigenvalues and then divide by them again, that doesn't make sense. You should not need to divide by them.

Sander

bawdyb

Jun 23, 2015, 2:59:22 PM6/23/15
True ! Thanks for correcting !

liao...@gmail.com

Jun 22, 2016, 10:59:14 AM6/22/16
to lasagne-users, baw...@gmail.com
By the way, you should normalize your data before doing PCA, i.e. convert 0-255 scale image to 0-1 scale image. Otherwise, your color augmentation will result a much higher value. See the post here http://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-analysis

Jan Schlüter

Jun 23, 2016, 8:05:30 AM6/23/16
to lasagne-users, baw...@gmail.com
And you shouldn't do a for loop over image_array.shape[0]. Just create enough samples for all data points in your minibatch in a single np.random.normal call. Furthermore, reshape the noise so it's correctly broadcasted instead of transposing the image data forth and back. (image_array += noise[:, :, np.newaxis, np.newaxis]). This will make things a lot faster!

Best, Jan

webs...@gmail.com

Dec 15, 2016, 6:17:08 PM12/15/16
to lasagne-users, baw...@gmail.com
I am really struggling to implement this fancy PCA augmentation method, here is what I believe I must do (correct me if I am wrong):
1) Create a Matrix where the first column contains all the red pixel data, the 2n column all the green pixel data and the 3rd all the blue pixel data from all the images in the dataset.
2) Calculate the mean of every column and subtract it from every respective column.
3) Normalise the data between 0 and 1? (is this necessary? since all values are already between 0 and 255)
4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.
5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2  + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).

But it seems like from this code
colour_channel_weights = np.array([-0.0148366, -0.01253134, -0.01040762], dtype='float32')
That the colour channel weights are very small and multiplying them by a random number less than 1 will make them even smaller. So wouldn't the overall effect of the augmentation have a super slim effect on the original data (like perpetuate is a miniscule amount of less than 1%)?

Am I on the right track here?

Jan Schlüter

Jan 6, 2017, 4:33:52 PM1/6/17
to lasagne-users, baw...@gmail.com, webs...@gmail.com
I am really struggling to implement this fancy PCA augmentation method, here is what I believe I must do (correct me if I am wrong):
1) Create a Matrix where the first column contains all the red pixel data, the 2n column all the green pixel data and the 3rd all the blue pixel data from all the images in the dataset.
Correct. Let's call this matrix "yourdata".

2) Calculate the mean of every column and subtract it from every respective column.
Correct, but step 4) can do this for you.

3) Normalise the data between 0 and 1? (is this necessary? since all values are already between 0 and 255)
Well, if you divide your data by 255, the eigenvalues will be 255*255 times smaller. The eigenvectors are the same. It depends on what scale the data is when you apply the color perturbation.

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.
cov = np.cov(yourdata.T)  # this already includes mean removal. note the transpose.
eigvals, eigvects = np.linalg.eigh(cov)

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2  + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).
I think you want the square root of the eigen values:
pca = np.sqrt(eigvals) * eigvects
perturb = (pca * np.random.randn(3) * 0.1).sum(axis=1)  # multiply by row vector, then sum horizontally (the eigen vectors are in columns)
Now you have an RGB perturbation vector to add to your image, which should be in the same scale you used in step 3.

Hope this helps!

jkqu...@gmail.com

Jan 17, 2017, 6:34:19 PM1/17/17
to lasagne-users, baw...@gmail.com, webs...@gmail.com
Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Jan Schlüter

Jan 19, 2017, 1:42:17 PM1/19/17
to lasagne-users
Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Good point, you may want to clip this back to the usual input range. It's surely possible -- you're sampling multiplication factors from a Gaussian. But note that I haven't checked back whether Alex used the square root of Eigenvalues or the Eigenvalues themselves -- if the latter, the result depends on what scale the input data was in (still in 0--255, or already in 0--1).

Jonathan Quijas

Jan 19, 2017, 2:03:07 PM1/19/17
Thanks! I have been checking for values less than 0 or greater than 1, and making them 0 or 1 respectively. I am using the square root of the eigenvalues. The method now works like a charm! Also, I am unclear as to why we are taking the square root of the eigenvalues. Without this step, the factors are just too high and the entire image needs normalization, but the paper which introduced the "fancy PCA" color jitter idea never touches on taking the square root of the eigenvalues. Any insight on this?

Thanks!! :D

On Thu, Jan 19, 2017 at 11:42 AM, Jan Schlüter wrote:
Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Good point, you may want to clip this back to the usual input range. It's surely possible -- you're sampling multiplication factors from a Gaussian. But note that I haven't checked back whether Alex used the square root of Eigenvalues or the Eigenvalues themselves -- if the latter, the result depends on what scale the input data was in (still in 0--255, or already in 0--1).

--
You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.

Jan Schlüter

Jan 19, 2017, 2:23:59 PM1/19/17
to lasagne-users, jkqu...@gmail.com
Also, I am unclear as to why we are taking the square root of the eigenvalues.

To make the results independent of the scale of the images you began with.

the paper which introduced the "fancy PCA" color jitter idea never touches on taking the square root of the eigenvalues. Any insight on this?

Maybe they didn't use the square root, but had the input range such that it worked fine.

samje...@gmail.com

Jan 25, 2017, 5:31:21 PM1/25/17
to lasagne-users
Hey, So I am also trying to implement this PCA augmentation, My code seems to have a bug in it, when I try to add the RGB noise to the image, the noise either blanks the image or makes it not look at all like the original.
here's the code:

def PCA(data, dims_rescaled_data):    imgvector = data.reshape(-1, 3)    # print imgvector    # calculate the covariance matrix    R = np.cov(imgvector.T)    # calculate eigenvectors & eigenvalues of the covariance matrix    # use 'eigh' rather than 'eig' since R is symmetric,    # the performance gain is substantial    evals, evecs = LA.eigh(R)    # sort eigenvalue in decreasing order    idx = np.argsort(evals)[::-1]    evecs = evecs[:, idx]    # sort eigenvectors according to same index    evals = evals[idx]    # select the first n eigenvectors (n is desired dimension    # of rescaled data array, or dims_rescaled_data)    evecs = evecs[:, :dims_rescaled_data]    # carry out the transformation on the data using eigenvectors    # and return the re-scaled data, eigenvalues, and eigenvectors    return evals, evecsdef perturbation_eigen( img):    eVal, eVec = PCA(img, 1)    pca = np.sqrt(eVal) * eVec    perturb = (pca * np.random.randn(3) * .1).sum(axis = 1)    print perturb    imgvector = img.reshape(-1, 3)    print '\n'    print imgvector    new_imgvector = np.add(imgvector, perturb) # error    unshaped_img =  new_imgvector.reshape(220, 220, -1 )    return unshaped_img

Any idea what's wrong, I think it's how I am adding the noise to the image, but other than that I'm not really sure

Jonathan Quijas

Jan 25, 2017, 8:30:58 PM1/25/17
I see you are only using the eigenvalue and eigenvector corresponding to the color channel with the highest variation. Why not try all three? This will result in an actual "pixel jitter" in the form of [I_r, I_g, I_b]. Also, do not reshape the image as you do in:
imgvector = img.reshape(-1, 3)

I hope this helps:
#imgvector = img.reshape(-1, 3) #Comment this reshape step out, then add the following code
#   Make sure all values are in correct range
img[img > 1.0] = 1.0
img[img < 0.0] = 0.0
return img

I would also really suggest using SingularValueDecomposition(SVD) instead of the eigendecomposition. Numerically speaking, it is much more stable. Finally, did you center all your pixel data around their means? In other words, don't forget to compute the mean red value, mean green value, and mean blue value (three scalars). Then subtract each value its corresponding color mean, sorta like this:
#   Compute centered rgb values matrix
(rows,cols,colors) = image.shape
rgb_mat = np.zeros((rows*cols,3))
for i in range(colors):
rgb_mat[:,i] = image[:,:,i].flatten()
rgb_mat -= np.mean(rgb_mat, axis=0)

I hope it helps!!

--
You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.

Jan Schlüter

Jan 26, 2017, 11:06:04 AM1/26/17
to lasagne-users, samje...@gmail.com
    # sort eigenvalue in decreasing order    idx = np.argsort(evals)[::-1]    evecs = evecs[:, idx]
    # sort eigenvectors according to same index    evals = evals[idx]

Note that they're sorted anyway. You'd just need to reverse them:
evecs = evecs[:, ::-1]
evals = evals[::-1]

    # select the first n eigenvectors (n is desired dimension    # of rescaled data array, or dims_rescaled_data)    evecs = evecs[:, :dims_rescaled_data]    # carry out the transformation on the data using eigenvectors    # and return the re-scaled data, eigenvalues, and eigenvectors    return evals, evecsdef perturbation_eigen( img):    eVal, eVec = PCA(img, 1)

As Jonathan said, use PCA(img, 3). Now your function returns 3 eigenvalues and only one eigenvector.

    pca = np.sqrt(eVal) * eVec    perturb = (pca * np.random.randn(3) * .1).sum(axis = 1)    print perturb    imgvector = img.reshape(-1, 3)    print '\n'    print imgvector    new_imgvector = np.add(imgvector, perturb) # error    unshaped_img =  new_imgvector.reshape(220, 220, -1 )

Instead of reshaping the image vector, just extend the perturbation vector:
perturb = perturb[np.newaxis, np.newaxis, :]
return img + perturb

Any idea what's wrong, I think it's how I am adding the noise to the image, but other than that I'm not really sure

Reading the code again, it seems you apply this to a *single* image? The idea is to compute the PCA on all your training data (or a sizeable fraction), then just use different np.random.randn() samples per image. (You can also adjust the code to modify a batch of images at once, with different random vectors per image, without using a for loop.)

samje...@gmail.com

Jan 26, 2017, 3:10:32 PM1/26/17
to lasagne-users
Thank you, I made the changes, couple of questions though. Do I mean center the data before normalizing it, but after reshaping it?  what does this do to the result? Also I found something weird when I print the image vectors when I'm done, the values look correct but when I display those rgb values, through opencv, as pixels the image is usually a white screen with some pixelated outlines. Any thoughts on why that might be?

Jonathan Quijas

Jan 26, 2017, 4:19:10 PM1/26/17
The white screen with pixelated stuff probably means you have values larger than 1. Your pixel range should be striclty [0,1], so after you do the color jittering, check for values smaller than 0 and turn those to 0, and check for values larger than 1, and make those 1. The centering of the data should be done right after you create the rgb value matrix (num_pixels, 3). This is to make the computation of the covariance matrix MUCH faster, and to ensure the singular value decomposition of your rgb value matrix works fine. Let me know if this helps. Also, when you mention "Do I mean center the data before normalizing it, but after reshaping it?", what normalization step are you refering to?

--
You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.

samje...@gmail.com

Jan 26, 2017, 4:40:58 PM1/26/17
to lasagne-users, jkqu...@gmail.com
I divided all the values in my image matrix by 255.0 to get them between 0 and 1, then from there I find the co-variance matrix and so on. at the end, after I add the noise and use the code u suggested above: img[img > 1.0] = 1.0
img[img < 0.0] = 0.0

then I multiply the matrix 255 across all values. is that not how I should normalize?
To unsubscribe from this group and all its topics, send an email to lasagne-user...@googlegroups.com.

To post to this group, send email to lasagn...@googlegroups.com.

Jan Schlüter

Jan 26, 2017, 4:45:08 PM1/26/17
to lasagne-users, jkqu...@gmail.com
Thank you, I made the changes, couple of questions though. Do I mean center the data before normalizing it, but after reshaping it?  what does this do to the result?

If you use np.cov(), there's no need to reshape it. If you switch to using SVD, then you'd mean-center the data and apply SVD to the data matrix directly instead of building the covariance matrix first. As Jonathan said, this is numerically more stable (but I guess it won't make much of a difference here).

Also I found something weird when I print the image vectors when I'm done, the values look correct but when I display those rgb values, through opencv, as pixels the image is usually a white screen with some pixelated outlines. Any thoughts on why that might be?

I assume your image data is still in range 0-255. After adding the perturbations, it will not be int8 any more, but floating point. I'm not sure about OpenCV, but other libraries like matplotlib will assume floating point data to range from 0.0 to 1.0. So you can either downscale it, or cast it to int8 again.

Jan Schlüter

Jan 26, 2017, 4:45:51 PM1/26/17
to lasagne-users, jkqu...@gmail.com
If you use np.cov(), there's no need to reshape it.

Argh. To mean-center it, I meant.

samje...@gmail.com

Jan 26, 2017, 5:23:26 PM1/26/17
to lasagne-users, jkqu...@gmail.com
Thank you, you were right about the data type, I assume leaving it in floating point would allow for more variation than unsigned int right? Or will the network round those values when i'm training it?

Jan Schlüter

Jan 27, 2017, 4:53:02 AM1/27/17
to lasagne-users, jkqu...@gmail.com, samje...@gmail.com
Thank you, you were right about the data type, I assume leaving it in floating point would allow for more variation than unsigned int right? Or will the network round those values when i'm training it?

No, for the network it'd need to be converted to floating-point anyway. Also it's easier to train when the input is standardized (i.e., zero mean and unit standard deviation). So for now I wouldn't suggest int8 input (although I've seen some (non-Theano) code use int8 for the first-layer convolutions, and cuDNN v6 will support this as well with recent Pascal GPUs, to improve throughput).

alb3rto...@gmail.com

May 5, 2017, 8:24:14 AM5/5/17
to lasagne-users

here's is my code, is it correct?

def com_PCA(image_array): imgvector = image_array.reshape(-1, 3) R = np.cov(imgvector.T) U,S,V = np.linalg.svd(R) print (U,S,V)
 eigenvalues = np.sqrt(S) # because cov is symmetric and psd return eigenvalues,U
def add_color_noise(image_array,eigenvalues,U,mean,batch_size,mu=0,sigma=0.1): # image_array = (image_array - np.stack([mean]*batch_size)) distorted_images_array = np.zeros((image_array.shape[0],3,224,224)) dx=dy=224 for idx in range(distorted_images_array.shape[0]): scale_factor = random.uniform( 0.9 , 1.1) # image_array[idx] = ndimage.zoom(image_array[idx], (1,scale_factor, scale_factor)) distorted_image = ndimage.zoom(image_array[idx], (1,scale_factor, scale_factor)) w, h = distorted_image.shape[1:] x = random.randint(0, w - dx - 1) y = random.randint(0, h - dy - 1) distorted_image = distorted_image[:,x:x+dx,y:y+dy] scipy.misc.imsave('before_outfile.jpg', distorted_image.transpose(2, 1, 0))
 # Generate the \alpha samples samples = np.random.normal(mu, sigma, 3) augmentation = samples * eigenvalues noise = np.dot(U, augmentation.T) # Add the noise
 z = distorted_image.transpose(2,1,0) + noise scipy.misc.imsave('after_outfile.jpg', z) distorted_images_array[idx]= z.transpose(2,1,0) return distorted_images_arrayand these are my output

Jan Schlüter

May 5, 2017, 9:08:24 AM5/5/17
to lasagne-users, alb3rto...@gmail.com
here's is my code, is it correct?

Sorry, I don't have time to read it, but the basic steps are given in https://groups.google.com/d/msg/lasagne-users/meCDNeA9Ud4/wdqHJeolEAAJ -- can you compare your code to those?
Your images seem almost the same, the second one is just a little brighter, but that's of course possible when sampling.

mrlogh...@gmail.com

Jul 20, 2017, 7:52:03 AM7/20/17
to lasagne-users
If you got a working implementation, could you please post it? I tried to re-implement it, but I'm not sure about the results.

Thanks.

Girish Mallya

Jun 17, 2021, 8:23:20 AM6/17/21
to lasagne-users
Sorry to be reviving this thread after so many years, but just want to clarify a small detail.

On Friday, January 6, 2017 at 9:33:52 PM UTC Jan Schlüter wrote:
4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.
cov = np.cov(yourdata.T)  # this already includes mean removal. note the transpose.
eigvals, eigvects = np.linalg.eigh(cov)

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2  + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).
I think you want the square root of the eigen values:
pca = np.sqrt(eigvals) * eigvects

So eigvects (returned by eigh) contains the eigenvectors in columns. Now, since we want a linear combination of these columns, shouldn't it be pca = eigvects * np.sqrt(eigvals)?