Help with setting up, configuring and training a network to Deep Dream cats into images

41 views

DeepDreamcaffecatsimagesprototxttraining

Skip to first unread message

Abe

unread,

Dec 31, 2018, 8:30:20 PM12/31/18

to Caffe Users

(Windows 7, sub-par GPU (GTX 860M; compute capability 5.0, though, so don’t discourage me), lots of time, machine learning noob)

What I’d like to do is have Deep Dream do what it does but instead of buildings, fishes, people, etc., I only want it to create cats in images (ideally, to be able to catify photographs with people on them).

What I did: trained a network with exclusively cat pictures (different sets for training and testing but only class 0 in train.txt and test.txt, and num_output: 1 in train_val.prototxt) and I’m scared to see what it’d do in Deep Dream (probably just random noise? idk).

My guess is that I need to train the network to identify cats and people and then maximize the activations of cats. Which I guess involves changing the weights of neurons associated with cats.

Anaconda, CUDA (w/ CuDNN), Caffe and Deep Dream all seem to be working ok, however I have close to no idea how to use them (properly). Following this tutorial, which I guess is only for differentiating between dogs and cats, I managed to run caffe train without any fatal errors... but the network isn’t actually learning anything useful (I think). Anyway, my question(s) is(are):

Is there a way to just change a value somewhere so bvlc_googlenet only finds cats in whatever images I feed it? (Not the main question, though.)

To train a network to be useable with Deep Dream to recognize cats (but not people or fishes or buildings or anything else) in images that are photographs with people on them, and mainly just these kinds of images [BTW, it just so happens that I have a few hundred photos of me and my friends, which is what I’m really interested in catifying], what kind of training and testing data do I need (kinds of images (people, cats, dogs, landscapes), number of images, ratio of cats and not cats in train and test, etc.)?

What should train.txt and test.txt look like)? Should they have the same number of classes? Should I skip any class(es) in one of them? What are the rules for setting up the classes in train.txt and test.txt?

How should I change num_output (the tutorial told me to modify this value only for layer fc8)?

What configurations should an 8GB of RAM, 2GB of VRAM system use for solver.prototxt? Default is absurd, so I did (in new lines)(not full file) lr_policy:”step”; test_iter: 100; test_interval: 500; stepsize: 2500 (only using 5000 images, all cats, currently); max_iter: 10000; iter_size: 5; weight_decay: 0.004; [for both of the data layers in train_val.prototxt] batch_size:10. GPU-Z tells me I’m only using 1GB of VRAM.

Is test net supposed to have 2 outputs (accuracy and loss) per class? And train net, just 1 (loss) per class? (with just cats I got 2 and 1, respectively)

Do I actually need deploy.prototxt for AlexNet? The tutorial didn’t mention it and I think the docs said it was just a copy of train_val.prototxt?

How would I maximize the activations of cats? (probably connected to the first question)

Also, please point out any flagrant mistakes in my setup (besides only having cat images because I thought that was only logical), which is essentially the same as the tutorial’s.

python script to create train.txt and test.txt (just lists all images, dividing them between train.txt and test.txt and adding the proper class number [which, in my case, guarantees 0% loss :D])---> convert_imageset.exe to create lmdb folders)---> compute_image_mean.exe to create the mean image---> caffe train call that results in no fatal exceptions, mostly because it’s not actually doing anything useful