Color augmentation (a la Krizhevsky et al)

bawdyb

unread,

Jun 23, 2015, 5:59:30 AM6/23/15

to lasagn...@googlegroups.com

Hi guys,

I have been trying to implement the color intensities augmentation (from Krizhevsky et al. 2012). However, the explanations provided in the paper are not clear for me. Could someone please explain to me what the authors mean by : "Specifically, we perform PCA on the set of RGB pixel values throughout the ImageNet training set.". Did they perform PCA over each of the channels? And also why do they only take an 3 by 3 cov-matrix for the pixels?

I saw that Sander applied the same method in his kaggle galaxy competition. But I don't understand how he managed to get the PCA values, it is provided directly in the code as a vector (in realtime_augmentation.py):

colour_channel_weights = np.array([-0.0148366, -0.01253134, -0.01040762], dtype='float32')


Many thanks

Sander Dieleman

unread,

Jun 23, 2015, 6:13:22 AM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

Basically, you have to treat every pixel in every image as a data point. This means you have a ton of data points which are vectors with 3 values: R, G and B. You can then compute PCA on these datapoints. This means you have to compute the covariance matrix of these vectors, which is a 3x3 matrix.

PCA will give you 3 vectors with 3 components. You can then sample 3 scale parameters, and add scaled versions of each of these 3 vectors to all pixels in the image. For best results you should also scale them by the corresponding eigenvalues. This will perturb the image colours along these PCA axes.

Note that what I did for the galaxy challenge is not exactly the same: I noticed that one of the PCA vectors had a much larger eigenvalue than the others, so it was clearly dominant. That's why I didn't bother using the other two vectors. I only used the one with the largest eigenvalue, so this was basically equivalent with brightness perturbation instead of colour perturbation.

Sander

bawdyb

unread,

Jun 23, 2015, 10:15:26 AM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

Thanks Sander, I tried to code it as two functions. The compute_PCA is called right after loading the data set and then for each batch I will call the add_color_noise(), is this correct ?

def compute_PCA(image_array):

    # Transpose and reshape the original image_array from N x channels x height x width  to  N x height x width x channels
    imT = image_array.transpose(0,2,3,1)
    reshaped_array = imT.reshape(imT.shape[0]*imT.shape[1]*imT.shape[2],3)

    # Get covariance matrix, the eigenvectors and eigenvalues
    cov = np.dot(reshaped_array.T, reshaped_array) / reshaped_array.shape[0]
    U,S,V = np.linalg.svd(cov)

    eigenvalues = np.sqrt(S) # because cov is symmetric and psd

    return eigenvalues,U

def add_color_noise(image_array,eigenvalues,U,mu=0,sigma=0.1):

    for idx in xrange(image_array.shape[0]):
        # Generate the \alpha samples
        samples = np.random.normal(mu, sigma, 3)

        augmentation = samples * eigenvalues
        noise = np.dot(U, augmentation.T)

        # Add the noise
        z = image_array[idx].transpose(1,2,0) + noise / eigenvalues # Scale here with the corresponding eigenvalue ?
        image_array[idx] = z.transpose(2,0,1)

Sander Dieleman

unread,

Jun 23, 2015, 10:57:45 AM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

You multiply with the eigenvalues and then divide by them again, that doesn't make sense. You should not need to divide by them.

Sander

bawdyb

unread,

Jun 23, 2015, 2:59:22 PM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

True ! Thanks for correcting !

liao...@gmail.com

unread,

Jun 22, 2016, 10:59:14 AM6/22/16

to lasagne-users, baw...@gmail.com

By the way, you should normalize your data before doing PCA, i.e. convert 0-255 scale image to 0-1 scale image. Otherwise, your color augmentation will result a much higher value. See the post here http://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-analysis

Jan Schlüter

unread,

Jun 23, 2016, 8:05:30 AM6/23/16

to lasagne-users, baw...@gmail.com

And you shouldn't do a for loop over image_array.shape[0]. Just create enough samples for all data points in your minibatch in a single np.random.normal call. Furthermore, reshape the noise so it's correctly broadcasted instead of transposing the image data forth and back. (image_array += noise[:, :, np.newaxis, np.newaxis]). This will make things a lot faster!

Best, Jan

webs...@gmail.com

unread,

Dec 15, 2016, 6:17:08 PM12/15/16

to lasagne-users, baw...@gmail.com

I am really struggling to implement this fancy PCA augmentation method, here is what I believe I must do (correct me if I am wrong):

1) Create a Matrix where the first column contains all the red pixel data, the 2n column all the green pixel data and the 3rd all the blue pixel data from all the images in the dataset.

2) Calculate the mean of every column and subtract it from every respective column.

3) Normalise the data between 0 and 1? (is this necessary? since all values are already between 0 and 255)

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2 + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).

But it seems like from this code

colour_channel_weights = np.array([-0.0148366, -0.01253134, -0.01040762], dtype='float32')

That the colour channel weights are very small and multiplying them by a random number less than 1 will make them even smaller. So wouldn't the overall effect of the augmentation have a super slim effect on the original data (like perpetuate is a miniscule amount of less than 1%)?

Am I on the right track here?

Jan Schlüter

unread,

Jan 6, 2017, 4:33:52 PM1/6/17

to lasagne-users, baw...@gmail.com, webs...@gmail.com

I am really struggling to implement this fancy PCA augmentation method, here is what I believe I must do (correct me if I am wrong):
1) Create a Matrix where the first column contains all the red pixel data, the 2n column all the green pixel data and the 3rd all the blue pixel data from all the images in the dataset.

Correct. Let's call this matrix "yourdata".

2) Calculate the mean of every column and subtract it from every respective column.

Correct, but step 4) can do this for you.

3) Normalise the data between 0 and 1? (is this necessary? since all values are already between 0 and 255)

Well, if you divide your data by 255, the eigenvalues will be 255*255 times smaller. The eigenvectors are the same. It depends on what scale the data is when you apply the color perturbation.

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.

cov = np.cov(yourdata.T) # this already includes mean removal. note the transpose.

eigvals, eigvects = np.linalg.eigh(cov)

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2 + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).

I think you want the square root of the eigen values:

pca = np.sqrt(eigvals) * eigvects

perturb = (pca * np.random.randn(3) * 0.1).sum(axis=1) # multiply by row vector, then sum horizontally (the eigen vectors are in columns)

Now you have an RGB perturbation vector to add to your image, which should be in the same scale you used in step 3.

Hope this helps!

jkqu...@gmail.com

unread,

Jan 17, 2017, 6:34:19 PM1/17/17

to lasagne-users, baw...@gmail.com, webs...@gmail.com

Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Jan Schlüter

unread,

Jan 19, 2017, 1:42:17 PM1/19/17

to lasagne-users

Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Good point, you may want to clip this back to the usual input range. It's surely possible -- you're sampling multiplication factors from a Gaussian. But note that I haven't checked back whether Alex used the square root of Eigenvalues or the Eigenvalues themselves -- if the latter, the result depends on what scale the input data was in (still in 0--255, or already in 0--1).

Jonathan Quijas

unread,

Jan 19, 2017, 2:03:07 PM1/19/17

to lasagn...@googlegroups.com

Thanks! I have been checking for values less than 0 or greater than 1, and making them 0 or 1 respectively. I am using the square root of the eigenvalues. The method now works like a charm! Also, I am unclear as to why we are taking the square root of the eigenvalues. Without this step, the factors are just too high and the entire image needs normalization, but the paper which introduced the "fancy PCA" color jitter idea never touches on taking the square root of the eigenvalues. Any insight on this?

Thanks!! :D

On Thu, Jan 19, 2017 at 11:42 AM, Jan Schlüter <goo...@jan-schlueter.de> wrote:

Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Good point, you may want to clip this back to the usual input range. It's surely possible -- you're sampling multiplication factors from a Gaussian. But note that I haven't checked back whether Alex used the square root of Eigenvalues or the Eigenvalues themselves -- if the latter, the result depends on what scale the input data was in (still in 0--255, or already in 0--1).

--
You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/6fdbd224-8ebd-4687-b0bb-ceb05aabcabf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Schlüter

unread,

Jan 19, 2017, 2:23:59 PM1/19/17

to lasagne-users, jkqu...@gmail.com

Also, I am unclear as to why we are taking the square root of the eigenvalues.

To make the results independent of the scale of the images you began with.

the paper which introduced the "fancy PCA" color jitter idea never touches on taking the square root of the eigenvalues. Any insight on this?

Maybe they didn't use the square root, but had the input range such that it worked fine.

samje...@gmail.com

unread,

Jan 25, 2017, 5:31:21 PM1/25/17

to lasagne-users

Hey, So I am also trying to implement this PCA augmentation, My code seems to have a bug in it, when I try to add the RGB noise to the image, the noise either blanks the image or makes it not look at all like the original.

here's the code:

def PCA(data, dims_rescaled_data):
    imgvector = data.reshape(-1, 3)
    # print imgvector
    # calculate the covariance matrix
    R = np.cov(imgvector.T)
    # calculate eigenvectors & eigenvalues of the covariance matrix
    # use 'eigh' rather than 'eig' since R is symmetric,
    # the performance gain is substantial
    evals, evecs = LA.eigh(R)
    # sort eigenvalue in decreasing order
    idx = np.argsort(evals)[::-1]
    evecs = evecs[:, idx]
    # sort eigenvectors according to same index
    evals = evals[idx]
    # select the first n eigenvectors (n is desired dimension
    # of rescaled data array, or dims_rescaled_data)
    evecs = evecs[:, :dims_rescaled_data]
    # carry out the transformation on the data using eigenvectors
    # and return the re-scaled data, eigenvalues, and eigenvectors
    return evals, evecs

def perturbation_eigen( img):

    eVal, eVec = PCA(img, 1)
    pca = np.sqrt(eVal) * eVec
    perturb = (pca * np.random.randn(3) * .1).sum(axis = 1)
    print perturb
    imgvector = img.reshape(-1, 3)
    print '\n'
    print imgvector
    new_imgvector = np.add(imgvector, perturb) # error
    unshaped_img =  new_imgvector.reshape(220, 220, -1 )
    return unshaped_img

Any idea what's wrong, I think it's how I am adding the noise to the image, but other than that I'm not really sure

Jonathan Quijas

unread,

Jan 25, 2017, 8:30:58 PM1/25/17

to lasagn...@googlegroups.com

I see you are only using the eigenvalue and eigenvector corresponding to the color channel with the highest variation. Why not try all three? This will result in an actual "pixel jitter" in the form of [I_r, I_g, I_b]. Also, do not reshape the image as you do in:

imgvector = img.reshape(-1, 3)

I hope this helps:

#imgvector = img.reshape(-1, 3) #Comment this reshape step out, then add the following code

# Add color jitters

img = np.add(image, perturb)

# Make sure all values are in correct range

img[img > 1.0] = 1.0

img[img < 0.0] = 0.0

return img

I would also really suggest using SingularValueDecomposition(SVD) instead of the eigendecomposition. Numerically speaking, it is much more stable. Finally, did you center all your pixel data around their means? In other words, don't forget to compute the mean red value, mean green value, and mean blue value (three scalars). Then subtract each value its corresponding color mean, sorta like this:

#   Compute centered rgb values matrix
(rows,cols,colors) = image.shape
rgb_mat = np.zeros((rows*cols,3))
for i in range(colors):
rgb_mat[:,i] = image[:,:,i].flatten()

rgb_mat -= np.mean(rgb_mat, axis=0)

I hope it helps!!

--
You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/340a3d7b-5660-45d5-82c0-1e55d1b314a9%40googlegroups.com.

Jan Schlüter

unread,

Jan 26, 2017, 11:06:04 AM1/26/17

to lasagne-users, samje...@gmail.com

    # sort eigenvalue in decreasing order
    idx = np.argsort(evals)[::-1]
    evecs = evecs[:, idx]

    # sort eigenvectors according to same index
    evals = evals[idx]

Note that they're sorted anyway. You'd just need to reverse them:
evecs = evecs[:, ::-1]
evals = evals[::-1]

    # select the first n eigenvectors (n is desired dimension
    # of rescaled data array, or dims_rescaled_data)
    evecs = evecs[:, :dims_rescaled_data]
    # carry out the transformation on the data using eigenvectors
    # and return the re-scaled data, eigenvalues, and eigenvectors
    return evals, evecs

def perturbation_eigen( img):

    eVal, eVec = PCA(img, 1)

As Jonathan said, use PCA(img, 3). Now your function returns 3 eigenvalues and only one eigenvector.

    pca = np.sqrt(eVal) * eVec
    perturb = (pca * np.random.randn(3) * .1).sum(axis = 1)
    print perturb
    imgvector = img.reshape(-1, 3)
    print '\n'
    print imgvector
    new_imgvector = np.add(imgvector, perturb) # error
    unshaped_img =  new_imgvector.reshape(220, 220, -1 )

Instead of reshaping the image vector, just extend the perturbation vector:
perturb = perturb[np.newaxis, np.newaxis, :]
return img + perturb

Any idea what's wrong, I think it's how I am adding the noise to the image, but other than that I'm not really sure

Reading the code again, it seems you apply this to a *single* image? The idea is to compute the PCA on all your training data (or a sizeable fraction), then just use different np.random.randn() samples per image. (You can also adjust the code to modify a batch of images at once, with different random vectors per image, without using a for loop.)

samje...@gmail.com

unread,

Jan 26, 2017, 3:10:32 PM1/26/17

to lasagne-users

Thank you, I made the changes, couple of questions though. Do I mean center the data before normalizing it, but after reshaping it? what does this do to the result? Also I found something weird when I print the image vectors when I'm done, the values look correct but when I display those rgb values, through opencv, as pixels the image is usually a white screen with some pixelated outlines. Any thoughts on why that might be?

Jonathan Quijas

unread,

Jan 26, 2017, 4:19:10 PM1/26/17

to lasagn...@googlegroups.com

The white screen with pixelated stuff probably means you have values larger than 1. Your pixel range should be striclty [0,1], so after you do the color jittering, check for values smaller than 0 and turn those to 0, and check for values larger than 1, and make those 1. The centering of the data should be done right after you create the rgb value matrix (num_pixels, 3). This is to make the computation of the covariance matrix MUCH faster, and to ensure the singular value decomposition of your rgb value matrix works fine. Let me know if this helps. Also, when you mention "Do I mean center the data before normalizing it, but after reshaping it?", what normalization step are you refering to?

--

You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/50a149cb-fb4d-4963-821a-fb85b8d97176%40googlegroups.com.

samje...@gmail.com

unread,

Jan 26, 2017, 4:40:58 PM1/26/17

to lasagne-users, jkqu...@gmail.com

I divided all the values in my image matrix by 255.0 to get them between 0 and 1, then from there I find the co-variance matrix and so on. at the end, after I add the noise and use the code u suggested above: img[img > 1.0] = 1.0

img[img < 0.0] = 0.0

then I multiply the matrix 255 across all values. is that not how I should normalize?

To unsubscribe from this group and all its topics, send an email to lasagne-user...@googlegroups.com.

To post to this group, send email to lasagn...@googlegroups.com.

Jan Schlüter

unread,

Jan 26, 2017, 4:45:08 PM1/26/17

to lasagne-users, jkqu...@gmail.com

Thank you, I made the changes, couple of questions though. Do I mean center the data before normalizing it, but after reshaping it? what does this do to the result?

If you use np.cov(), there's no need to reshape it. If you switch to using SVD, then you'd mean-center the data and apply SVD to the data matrix directly instead of building the covariance matrix first. As Jonathan said, this is numerically more stable (but I guess it won't make much of a difference here).

Also I found something weird when I print the image vectors when I'm done, the values look correct but when I display those rgb values, through opencv, as pixels the image is usually a white screen with some pixelated outlines. Any thoughts on why that might be?

I assume your image data is still in range 0-255. After adding the perturbations, it will not be int8 any more, but floating point. I'm not sure about OpenCV, but other libraries like matplotlib will assume floating point data to range from 0.0 to 1.0. So you can either downscale it, or cast it to int8 again.

Jan Schlüter

unread,

Jan 26, 2017, 4:45:51 PM1/26/17

to lasagne-users, jkqu...@gmail.com

If you use np.cov(), there's no need to reshape it.

Argh. To mean-center it, I meant.

samje...@gmail.com

unread,

Jan 26, 2017, 5:23:26 PM1/26/17

to lasagne-users, jkqu...@gmail.com

Thank you, you were right about the data type, I assume leaving it in floating point would allow for more variation than unsigned int right? Or will the network round those values when i'm training it?

Jan Schlüter

unread,

Jan 27, 2017, 4:53:02 AM1/27/17

to lasagne-users, jkqu...@gmail.com, samje...@gmail.com

Thank you, you were right about the data type, I assume leaving it in floating point would allow for more variation than unsigned int right? Or will the network round those values when i'm training it?

No, for the network it'd need to be converted to floating-point anyway. Also it's easier to train when the input is standardized (i.e., zero mean and unit standard deviation). So for now I wouldn't suggest int8 input (although I've seen some (non-Theano) code use int8 for the first-layer convolutions, and cuDNN v6 will support this as well with recent Pascal GPUs, to improve throughput).

alb3rto...@gmail.com

unread,

May 5, 2017, 8:24:14 AM5/5/17

to lasagne-users

here's is my code, is it correct?

def com_PCA(image_array):


 imgvector = image_array.reshape(-1, 3)
 R = np.cov(imgvector.T)

 U,S,V = np.linalg.svd(R)

 print (U,S,V)



 eigenvalues = np.sqrt(S) # because cov is symmetric and psd

 return eigenvalues,U

def add_color_noise(image_array,eigenvalues,U,mean,batch_size,mu=0,sigma=0.1):

 # image_array = (image_array - np.stack([mean]*batch_size))
 distorted_images_array = np.zeros((image_array.shape[0],3,224,224))

 dx=dy=224


 for idx in range(distorted_images_array.shape[0]):
 scale_factor = random.uniform( 0.9 , 1.1)

 # image_array[idx] = ndimage.zoom(image_array[idx], (1,scale_factor, scale_factor))
 distorted_image = ndimage.zoom(image_array[idx], (1,scale_factor, scale_factor))



 w, h = distorted_image.shape[1:]
 x = random.randint(0, w - dx - 1)
 y = random.randint(0, h - dy - 1)

 distorted_image = distorted_image[:,x:x+dx,y:y+dy]

 scipy.misc.imsave('before_outfile.jpg', distorted_image.transpose(2, 1, 0))



 # Generate the \alpha samples
 samples = np.random.normal(mu, sigma, 3)

 augmentation = samples * eigenvalues
 noise = np.dot(U, augmentation.T)


 # Add the noise


 z = distorted_image.transpose(2,1,0) + noise

 scipy.misc.imsave('after_outfile.jpg', z)

 distorted_images_array[idx]= z.transpose(2,1,0)
 return distorted_images_array


and these are my output

Jan Schlüter

unread,

May 5, 2017, 9:08:24 AM5/5/17

to lasagne-users, alb3rto...@gmail.com

here's is my code, is it correct?

Sorry, I don't have time to read it, but the basic steps are given in https://groups.google.com/d/msg/lasagne-users/meCDNeA9Ud4/wdqHJeolEAAJ -- can you compare your code to those?
Your images seem almost the same, the second one is just a little brighter, but that's of course possible when sampling.

mrlogh...@gmail.com

unread,

Jul 20, 2017, 7:52:03 AM7/20/17

to lasagne-users

If you got a working implementation, could you please post it? I tried to re-implement it, but I'm not sure about the results.

Thanks.

Girish Mallya

unread,

Jun 17, 2021, 8:23:20 AM6/17/21

to lasagne-users

Sorry to be reviving this thread after so many years, but just want to clarify a small detail.

On Friday, January 6, 2017 at 9:33:52 PM UTC Jan Schlüter wrote:

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.
cov = np.cov(yourdata.T) # this already includes mean removal. note the transpose.
eigvals, eigvects = np.linalg.eigh(cov)

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2 + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).
I think you want the square root of the eigen values:
pca = np.sqrt(eigvals) * eigvects

So eigvects (returned by eigh) contains the eigenvectors in columns. Now, since we want a linear combination of these columns, shouldn't it be pca = eigvects * np.sqrt(eigvals)?

Reply all

Reply to author

Forward