3,836 views

Skip to first unread message

Jun 23, 2015, 5:59:30 AM6/23/15

to lasagn...@googlegroups.com

Hi guys,

I have been trying to implement the color intensities augmentation (from Krizhevsky et al. 2012). However, the explanations provided in the paper are not clear for me. Could someone please explain to me what the authors mean by : "Specifically, we perform PCA on the set of RGB pixel values throughout the
ImageNet training set.". Did they perform PCA over each of the channels? And also why do they only take an 3 by 3 cov-matrix for the pixels?

I saw that Sander applied the same method in his kaggle galaxy competition. But I don't understand how he managed to get the PCA values, it is provided directly in the code as a vector (in realtime_augmentation.py):

colour_channel_weights = np.array([-0.0148366, -0.01253134, -0.01040762], dtype='float32')

Many thanks

Jun 23, 2015, 6:13:22 AM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

Basically, you have to treat every pixel in every image as a data point. This means you have a ton of data points which are vectors with 3 values: R, G and B. You can then compute PCA on these datapoints. This means you have to compute the covariance matrix of these vectors, which is a 3x3 matrix.

PCA will give you 3 vectors with 3 components. You can then sample 3 scale parameters, and add scaled versions of each of these 3 vectors to all pixels in the image. For best results you should also scale them by the corresponding eigenvalues. This will perturb the image colours along these PCA axes.

Note that what I did for the galaxy challenge is not exactly the same: I noticed that one of the PCA vectors had a much larger eigenvalue than the others, so it was clearly dominant. That's why I didn't bother using the other two vectors. I only used the one with the largest eigenvalue, so this was basically equivalent with brightness perturbation instead of colour perturbation.

Sander

Jun 23, 2015, 10:15:26 AM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

Thanks Sander, I tried to code it as two functions. The compute_PCA is called right after loading the data set and then for each batch I will call the add_color_noise(), is this correct ?

def compute_PCA(image_array):

# Transpose and reshape the original image_array from N x channels x height x width to N x height x width x channels

imT = image_array.transpose(0,2,3,1)

reshaped_array = imT.reshape(imT.shape[0]*imT.shape[1]*imT.shape[2],3)

# Get covariance matrix, the eigenvectors and eigenvalues

cov = np.dot(reshaped_array.T, reshaped_array) / reshaped_array.shape[0]

U,S,V = np.linalg.svd(cov)

eigenvalues = np.sqrt(S) # because cov is symmetric and psd

return eigenvalues,U

def add_color_noise(image_array,eigenvalues,U,mu=0,sigma=0.1):

for idx in xrange(image_array.shape[0]):

# Generate the \alpha samples

samples = np.random.normal(mu, sigma, 3)

augmentation = samples * eigenvalues

noise = np.dot(U, augmentation.T)

# Add the noise

z = image_array[idx].transpose(1,2,0) + noise / eigenvalues # Scale here with the corresponding eigenvalue ?

image_array[idx] = z.transpose(2,0,1)

Jun 23, 2015, 10:57:45 AM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

You multiply with the eigenvalues and then divide by them again, that doesn't make sense. You should not need to divide by them.

Sander

Jun 23, 2015, 2:59:22 PM6/23/15

to lasagn...@googlegroups.com, baw...@gmail.com

True ! Thanks for correcting !

Jun 22, 2016, 10:59:14 AM6/22/16

to lasagne-users, baw...@gmail.com

By the way, you should normalize your data before doing PCA, i.e. convert 0-255 scale image to 0-1 scale image. Otherwise, your color augmentation will result a much higher value. See the post here http://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-analysis

Jun 23, 2016, 8:05:30 AM6/23/16

to lasagne-users, baw...@gmail.com

And you shouldn't do a for loop over image_array.shape[0]. Just create enough samples for all data points in your minibatch in a single np.random.normal call. Furthermore, reshape the noise so it's correctly broadcasted instead of transposing the image data forth and back. (image_array += noise[:, :, np.newaxis, np.newaxis]). This will make things a lot faster!

Best, Jan

Best, Jan

Dec 15, 2016, 6:17:08 PM12/15/16

to lasagne-users, baw...@gmail.com

I am really struggling to implement this fancy PCA augmentation method, here is what I believe I must do (correct me if I am wrong):

1) Create a Matrix where the first column contains all the red pixel data, the 2n column all the green pixel data and the 3rd all the blue pixel data from all the images in the dataset.

2) Calculate the mean of every column and subtract it from every respective column.

3) Normalise the data between 0 and 1? (is this necessary? since all values are already between 0 and 255)

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2 + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).

But it seems like from this code

colour_channel_weights = np.array([-0.0148366, -0.01253134, -0.01040762], dtype='float32')

That the colour channel weights are very small and multiplying them by a random number less than 1 will make them even smaller. So wouldn't the overall effect of the augmentation have a super slim effect on the original data (like perpetuate is a miniscule amount of less than 1%)?

Am I on the right track here?

Jan 6, 2017, 4:33:52 PM1/6/17

to lasagne-users, baw...@gmail.com, webs...@gmail.com

I am really struggling to implement this fancy PCA augmentation method, here is what I believe I must do (correct me if I am wrong):1) Create a Matrix where the first column contains all the red pixel data, the 2n column all the green pixel data and the 3rd all the blue pixel data from all the images in the dataset.

Correct. Let's call this matrix "yourdata".

2) Calculate the mean of every column and subtract it from every respective column.

Correct, but step 4) can do this for you.

3) Normalise the data between 0 and 1? (is this necessary? since all values are already between 0 and 255)

Well, if you divide your data by 255, the eigenvalues will be 255*255 times smaller. The eigenvectors are the same. It depends on what scale the data is when you apply the color perturbation.

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.

cov = np.cov(yourdata.T) # this already includes mean removal. note the transpose.

eigvals, eigvects = np.linalg.eigh(cov)

5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2 + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).

I think you want the square root of the eigen values:

pca = np.sqrt(eigvals) * eigvects

perturb = (pca * np.random.randn(3) * 0.1).sum(axis=1) # multiply by row vector, then sum horizontally (the eigen vectors are in columns)

Now you have an RGB perturbation vector to add to your image, which should be in the same scale you used in step 3.

Hope this helps!

Jan 17, 2017, 6:34:19 PM1/17/17

to lasagne-users, baw...@gmail.com, webs...@gmail.com

Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Jan 19, 2017, 1:42:17 PM1/19/17

to lasagne-users

Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Good point, you may want to clip this back to the usual input range. It's surely possible -- you're sampling multiplication factors from a Gaussian. But note that I haven't checked back whether Alex used the square root of Eigenvalues or the Eigenvalues themselves -- if the latter, the result depends on what scale the input data was in (still in 0--255, or already in 0--1).

Jan 19, 2017, 2:03:07 PM1/19/17

to lasagn...@googlegroups.com

Thanks! I have been checking for values less than 0 or greater than 1, and making them 0 or 1 respectively. I am using the square root of the eigenvalues. The method now works like a charm! Also, I am unclear as to why we are taking the square root of the eigenvalues. Without this step, the factors are just too high and the entire image needs normalization, but the paper which introduced the "fancy PCA" color jitter idea never touches on taking the square root of the eigenvalues. Any insight on this?

Thanks!! :D

On Thu, Jan 19, 2017 at 11:42 AM, Jan Schlüter <goo...@jan-schlueter.de> wrote:

Hey, is it normal to get values slightly larger than 1.0 and slightly less than 0.0? I am getting this.

Good point, you may want to clip this back to the usual input range. It's surely possible -- you're sampling multiplication factors from a Gaussian. But note that I haven't checked back whether Alex used the square root of Eigenvalues or the Eigenvalues themselves -- if the latter, the result depends on what scale the input data was in (still in 0--255, or already in 0--1).

--

You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.

To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.

To post to this group, send email to lasagn...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/6fdbd224-8ebd-4687-b0bb-ceb05aabcabf%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jan 19, 2017, 2:23:59 PM1/19/17

to lasagne-users, jkqu...@gmail.com

Also, I am unclear as to why we are taking the square root of the eigenvalues.

To make the results independent of the scale of the images you began with.

the paper which introduced the "fancy PCA" color jitter idea never touches on taking the square root of the eigenvalues. Any insight on this?

Maybe they didn't use the square root, but had the input range such that it worked fine.

Jan 25, 2017, 5:31:21 PM1/25/17

to lasagne-users

Hey, So I am also trying to implement this PCA augmentation, My code seems to have a bug in it, when I try to add the RGB noise to the image, the noise either blanks the image or makes it not look at all like the original.

`here's the code:`

def PCA(data, dims_rescaled_data):

imgvector = data.reshape(-1, 3)

# print imgvector

# calculate the covariance matrix

R = np.cov(imgvector.T)

# calculate eigenvectors & eigenvalues of the covariance matrix

# use 'eigh' rather than 'eig' since R is symmetric,

# the performance gain is substantial

evals, evecs = LA.eigh(R)

# sort eigenvalue in decreasing order

idx = np.argsort(evals)[::-1]

evecs = evecs[:, idx]

# sort eigenvectors according to same index

evals = evals[idx]

# select the first n eigenvectors (n is desired dimension

# of rescaled data array, or dims_rescaled_data)

evecs = evecs[:, :dims_rescaled_data]

# carry out the transformation on the data using eigenvectors

# and return the re-scaled data, eigenvalues, and eigenvectors

return evals, evecs

def perturbation_eigen( img):

eVal, eVec = PCA(img, 1)

pca = np.sqrt(eVal) * eVec

perturb = (pca * np.random.randn(3) * .1).sum(axis = 1)

print perturb

imgvector = img.reshape(-1, 3)

print '\n'

print imgvector

new_imgvector = np.add(imgvector, perturb) # error

unshaped_img = new_imgvector.reshape(220, 220, -1 )

return unshaped_img

Any idea what's wrong, I think it's how I am adding the noise to the image, but other than that I'm not really sure

Jan 25, 2017, 8:30:58 PM1/25/17

to lasagn...@googlegroups.com

I see you are only using the eigenvalue and eigenvector corresponding to the color channel with the highest variation. Why not try all three? This will result in an actual "pixel jitter" in the form of [I_r, I_g, I_b]. Also, do not reshape the image as you do in:

imgvector = img.reshape(-1, 3)

#imgvector = img.reshape(-1, 3) #Comment this reshape step out, then add the following code

# Add color jitters

img = np.add(image, perturb)

# Make sure all values are in correct range

img[img > 1.0] = 1.0

img[img < 0.0] = 0.0

return img

I would also really suggest using SingularValueDecomposition(SVD) instead of the eigendecomposition. Numerically speaking, it is much more stable. Finally, did you center all your pixel data around their means? In other words, don't forget to compute the mean red value, mean green value, and mean blue value (three scalars). Then subtract each value its corresponding color mean, sorta like this:

```
# Compute centered rgb values matrix
(rows,cols,colors) = image.shape
rgb_mat = np.zeros((rows*cols,3))
for i in range(colors):
rgb_mat[:,i] = image[:,:,i].flatten()
```

`rgb_mat -= np.mean(rgb_mat, axis=0)`

`I hope it helps!!`

--

You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.

To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.

To post to this group, send email to lasagn...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/340a3d7b-5660-45d5-82c0-1e55d1b314a9%40googlegroups.com.

Jan 26, 2017, 11:06:04 AM1/26/17

to lasagne-users, samje...@gmail.com

# sort eigenvalue in decreasing order

idx = np.argsort(evals)[::-1]

evecs = evecs[:, idx]

# sort eigenvectors according to same index

evals = evals[idx]

Note that they're sorted anyway. You'd just need to reverse them:

evecs = evecs[:, ::-1]

evals = evals[::-1]

# select the first n eigenvectors (n is desired dimension

# of rescaled data array, or dims_rescaled_data)

evecs = evecs[:, :dims_rescaled_data]

# carry out the transformation on the data using eigenvectors

# and return the re-scaled data, eigenvalues, and eigenvectors

return evals, evecs

def perturbation_eigen( img):

eVal, eVec = PCA(img, 1)

As Jonathan said, use PCA(img, 3). Now your function returns 3 eigenvalues and only one eigenvector.

pca = np.sqrt(eVal) * eVec

perturb = (pca * np.random.randn(3) * .1).sum(axis = 1)

print perturb

imgvector = img.reshape(-1, 3)

print '\n'

print imgvector

new_imgvector = np.add(imgvector, perturb) # error

unshaped_img = new_imgvector.reshape(220, 220, -1 )

Instead of reshaping the image vector, just extend the perturbation vector:

perturb = perturb[np.newaxis, np.newaxis, :]

return img + perturb

Any idea what's wrong, I think it's how I am adding the noise to the image, but other than that I'm not really sure

Reading the code again, it seems you apply this to a *single* image? The idea is to compute the PCA on all your training data (or a sizeable fraction), then just use different np.random.randn() samples per image. (You can also adjust the code to modify a batch of images at once, with different random vectors per image, without using a for loop.)

Jan 26, 2017, 3:10:32 PM1/26/17

to lasagne-users

Thank you, I made the changes, couple of questions though. Do I mean center the data before normalizing it, but after reshaping it? what does this do to the result? Also I found something weird when I print the image vectors when I'm done, the values look correct but when I display those rgb values, through opencv, as pixels the image is usually a white screen with some pixelated outlines. Any thoughts on why that might be?

Jan 26, 2017, 4:19:10 PM1/26/17

to lasagn...@googlegroups.com

The white screen with pixelated stuff probably means you have values larger than 1. Your pixel range should be striclty [0,1], so after you do the color jittering, check for values smaller than 0 and turn those to 0, and check for values larger than 1, and make those 1. The centering of the data should be done right after you create the rgb value matrix (num_pixels, 3). This is to make the computation of the covariance matrix MUCH faster, and to ensure the singular value decomposition of your rgb value matrix works fine. Let me know if this helps. Also, when you mention "Do I mean center the data before normalizing it, but after reshaping it?", what normalization step are you refering to?

--

You received this message because you are subscribed to a topic in the Google Groups "lasagne-users" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/lasagne-users/meCDNeA9Ud4/unsubscribe.

To unsubscribe from this group and all its topics, send an email to lasagne-users+unsubscribe@googlegroups.com.

To post to this group, send email to lasagn...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/50a149cb-fb4d-4963-821a-fb85b8d97176%40googlegroups.com.

Jan 26, 2017, 4:40:58 PM1/26/17

to lasagne-users, jkqu...@gmail.com

I divided all the values in my image matrix by 255.0 to get them between 0 and 1, then from there I find the co-variance matrix and so on. at the end, after I add the noise and use the code u suggested above: img[img > 1.0] = 1.0

img[img < 0.0] = 0.0

then I multiply the matrix 255 across all values. is that not how I should normalize?

To unsubscribe from this group and all its topics, send an email to lasagne-user...@googlegroups.com.

Jan 26, 2017, 4:45:08 PM1/26/17

to lasagne-users, jkqu...@gmail.com

Thank you, I made the changes, couple of questions though. Do I mean center the data before normalizing it, but after reshaping it? what does this do to the result?

If you use np.cov(), there's no need to reshape it. If you switch to using SVD, then you'd mean-center the data and apply SVD to the data matrix directly instead of building the covariance matrix first. As Jonathan said, this is numerically more stable (but I guess it won't make much of a difference here).

Also I found something weird when I print the image vectors when I'm done, the values look correct but when I display those rgb values, through opencv, as pixels the image is usually a white screen with some pixelated outlines. Any thoughts on why that might be?

I assume your image data is still in range 0-255. After adding the perturbations, it will not be int8 any more, but floating point. I'm not sure about OpenCV, but other libraries like matplotlib will assume floating point data to range from 0.0 to 1.0. So you can either downscale it, or cast it to int8 again.

Jan 26, 2017, 4:45:51 PM1/26/17

to lasagne-users, jkqu...@gmail.com

If you use np.cov(), there's no need to reshape it.

Argh. To mean-center it, I meant.

Jan 26, 2017, 5:23:26 PM1/26/17

to lasagne-users, jkqu...@gmail.com

Thank you, you were right about the data type, I assume leaving it in floating point would allow for more variation than unsigned int right? Or will the network round those values when i'm training it?

Jan 27, 2017, 4:53:02 AM1/27/17

to lasagne-users, jkqu...@gmail.com, samje...@gmail.com

Thank you, you were right about the data type, I assume leaving it in floating point would allow for more variation than unsigned int right? Or will the network round those values when i'm training it?

No, for the network it'd need to be converted to floating-point anyway. Also it's easier to train when the input is standardized (i.e., zero mean and unit standard deviation). So for now I wouldn't suggest int8 input (although I've seen some (non-Theano) code use int8 for the first-layer convolutions, and cuDNN v6 will support this as well with recent Pascal GPUs, to improve throughput).

May 5, 2017, 8:24:14 AM5/5/17

to lasagne-users

here's is my code, is it correct?

def com_PCA(image_array):

imgvector = image_array.reshape(-1, 3)

R = np.cov(imgvector.T)

U,S,V = np.linalg.svd(R)

print (U,S,V)

eigenvalues = np.sqrt(S) # because cov is symmetric and psd

return eigenvalues,U

def add_color_noise(image_array,eigenvalues,U,mean,batch_size,mu=0,sigma=0.1):

# image_array = (image_array - np.stack([mean]*batch_size))

distorted_images_array = np.zeros((image_array.shape[0],3,224,224))

dx=dy=224

for idx in range(distorted_images_array.shape[0]):

scale_factor = random.uniform( 0.9 , 1.1)

# image_array[idx] = ndimage.zoom(image_array[idx], (1,scale_factor, scale_factor))

distorted_image = ndimage.zoom(image_array[idx], (1,scale_factor, scale_factor))

w, h = distorted_image.shape[1:]

x = random.randint(0, w - dx - 1)

y = random.randint(0, h - dy - 1)

distorted_image = distorted_image[:,x:x+dx,y:y+dy]

scipy.misc.imsave('before_outfile.jpg', distorted_image.transpose(2, 1, 0))

# Generate the \alpha samples

samples = np.random.normal(mu, sigma, 3)

augmentation = samples * eigenvalues

noise = np.dot(U, augmentation.T)

# Add the noise

z = distorted_image.transpose(2,1,0) + noise

scipy.misc.imsave('after_outfile.jpg', z)

distorted_images_array[idx]= z.transpose(2,1,0)

return distorted_images_arrayand these are my output

May 5, 2017, 9:08:24 AM5/5/17

to lasagne-users, alb3rto...@gmail.com

here's is my code, is it correct?

Sorry, I don't have time to read it, but the basic steps are given in https://groups.google.com/d/msg/lasagne-users/meCDNeA9Ud4/wdqHJeolEAAJ -- can you compare your code to those?

Your images seem almost the same, the second one is just a little brighter, but that's of course possible when sampling.

Jul 20, 2017, 7:52:03 AM7/20/17

to lasagne-users

If you got a working implementation, could you please post it? I tried to re-implement it, but I'm not sure about the results.

Thanks.

Thanks.

Jun 17, 2021, 8:23:20 AM6/17/21

to lasagne-users

Sorry to be reviving this thread after so many years, but just want to clarify a small detail.

On Friday, January 6, 2017 at 9:33:52 PM UTC Jan Schlüter wrote:

4) Apply PCA, i.e. create covariance matrix and compute the 3 eigenvectors and eigenvalues.cov = np.cov(yourdata.T) # this already includes mean removal. note the transpose.eigvals, eigvects = np.linalg.eigh(cov)5) Then add eigenVec1 * a1 * eigenVal1 + eigenVec2 * a2 * eigenVal2 + eigenVec3 * a3 * eigenVal3 to each rgb channel in every image; Where 'a' is sampled from a gaussian with 0 mean and 0.1 std (or 0.5).I think you want the square root of the eigen values:pca = np.sqrt(eigvals) * eigvects

So eigvects (returned by eigh) contains the eigenvectors in columns. Now, since we want a linear combination of these columns, shouldn't it be pca = eigvects * np.sqrt(eigvals)?

Reply all

Reply to author

Forward

0 new messages