Deep learning architecture is rapidly gaining steam as more and more efficient architectures emerge from research papers emerge from around the world. These research papers not only contain a ton of information but also provide a new way to the birth of new Deep learning architectures, they can often be difficult to parse through. And to understand these papers, one might have to go through that paper multiple times and perhaps even other dependent papers. Well! Inception is one of them.
The Inception network was a crucial milestone in the development of CNN Image classifiers. Prior to this architecture, most popular CNNs or the classifiers just used stacked convolution layers deeper and deeper to obtain better performance.
Inception architecture uses the CNN blocks multiple times with different filters like 11, 33, 55, etc., so let us create a class for CNN block, which takes input channels and output channels along with batchnorm2d and ReLu activation.
Then create a class for inception module with dimension reduction, refer the figure above, which shows that is output from 11 filter, reduction 33, then output from 33 filter, reduction 55, then output from 55 and the out from 11 pool.
A few months ago, I wrote a tutorial on how to classify images using Convolutional Neural Networks (specifically, VGG16) pre-trained on the ImageNet dataset with Python and the Keras deep learning library.
Back then, the pre-trained ImageNet models were separate from the core Keras library, requiring us to clone a free-standing GitHub repo and then manually copy the code into our projects.
These 1,000 image categories represent object classes that we encounter in our day-to-day lives, such as species of dogs, cats, various household objects, vehicle types, and much more. You can find the full list of object categories in the ILSVRC challenge here.
The state-of-the-art pre-trained networks included in the Keras core library represent some of the highest performing Convolutional Neural Networks on the ImageNet challenge over the past few years. These networks also demonstrate a strong ability to generalize to images outside the ImageNet dataset via transfer learning, such as feature extraction and fine-tuning.
This network is characterized by its simplicity, using only 33 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling. Two fully-connected layers, each with 4,096 nodes are then followed by a softmax classifier (above).
In 2014, 16 and 19 layer networks were considered very deep (although we now have the ResNet architecture which can be successfully trained at depths of 50-200 for ImageNet and over 1,000 for CIFAR-10).
Simonyan and Zisserman found training VGG16 and VGG19 challenging (specifically regarding convergence on the deeper networks), so in order to make training easier, they first trained smaller versions of VGG with less weight layers (columns A and C) first.
While making logical sense, pre-training is a very time consuming, tedious task, requiring an entire network to be trained before it can serve as an initialization for a deeper network.
We no longer use pre-training (in most cases) and instead prefer Xaiver/Glorot initialization or MSRA initialization (sometimes called He et al. initialization from the paper, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification). You can read more about the importance of weight initialization and the convergence of deep neural networks inside All you need is a good init, Mishkin and Matas (2015).
First introduced by He et al. in their 2015 paper, Deep Residual Learning for Image Recognition, the ResNet architecture has become a seminal work, demonstrating that extremely deep networks can be trained using standard SGD (and a reasonable initialization function) through the use of residual modules:
The Inception V3 architecture included in the Keras core comes from the later publication by Szegedy et al., Rethinking the Inception Architecture for Computer Vision (2015) which proposes updates to the inception module to further boost ImageNet classification accuracy.
Line 7 gives us access to the imagenet_utils sub-module, a handy set of convenience functions that will make pre-processing our input images and decoding output classifications easier.
However, if we are using Inception or Xception, we need to set the inputShape to 299299 pixels, followed by updating preprocess to use a separate pre-processing function that performs a different type of scaling.
All updated examples in this blog post were gathered TensorFlow 2.2. Previously this blog post used Keras >= 2.0 and a TensorFlow backend (when they were separate packages) and was also tested with the Theano backend and confirmed that the implementation will work with Theano as well.
As you can see from the examples in this blog post, networks pre-trained on the ImageNet dataset are capable of recognizing a variety of common day-to-day objects. I hope that you can use this code in your own projects!
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
If you are interested in learning more about deep learning and Convolutional Neural Networks (and how to train your own networks from scratch), be sure to take a look at my book, Deep Learning for Computer Vision with Python, available for order now.
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
PyImageSearch University is really the best Computer Visions "Masters" Degree that I wish I had when starting out. Being able to access all of Adrian's tutorials in a single indexed page and being able to start playing around with the code without going through the nightmare of setting up everything is just amazing. 10/10 would recommend.
The goal of this project is to create, foster, and disseminate solid conventions forPython project structure through the propagation and reuse of project archetypes,developed and submitted by the community, which can be used as templates forautomatically setting up new Python projects.
The package name that will used for your new project, e.g.inception_tools. This will be used to create for the name of thepackage, for the name of a stub entry point files, and in the names oftest modules. It will also be used as the relative path for theproject_root argument in the event that it is omitted (see below).
Contributions and feedback are welcome. Contributions can be made by openinga pull request at the inception-tools repository and tagging @avanherick forreview. Please see the Development section of this document for code styleand branching guidelines.
This project was created to fill what looked like a lack of standardizedconventions practices for structuring Python projects, and out of the desireto avoid the need to manually create the same directory and file structuresover and over again.
Using the default feature extraction (Inception v3 using the original weights from inception ref2), the inputis expected to be mini-batches of 3-channel RGB images of shape (3xHxW). If argument normalizeis True images are expected to be dtype float and have values in the [0,1] range, else ifnormalize is set to False images are expected to have dtype uint8 and take values in the [0, 255]range. All images will be resized to 299 x 299 which is the size of the original training data.
Originating from Southeast Asia, Burmese pythons were introduced to Florida largely through the pet trade. Their release or escape into the wild, is thought to be the result of the devastation caused by hurricane Andrew in 1992, which allowed untold privately-owned animals as well as those for sale in pet stores to escape. This led to an unchecked expansion of pythons in the Everglades, which was exacerbated by the absence of natural predators for these large reptiles. Over the past several decades, these invaders have wreaked havoc on native species, from birds to mammals, including deer, pigs, alligators, and other fauna, thus potentially disrupting food chains and overall ecosystem balance.
The Florida Python Challenge, conducted by the FWC in collaboration with entities such as the South Florida Water Management District, has been pivotal in engaging the public in the removal of these reptiles. Beyond mere removal, it has been a platform for educating the public about the fragile ecosystems of the Everglades and the repercussions of introducing non-native species. As the Challenge reaches its 10-year anniversary, it has facilitated the removal of many thousands of pythons, restoring some balance and affording native fauna a fighting chance. Its dual-pronged approach of eradication and education serves as a blueprint for addressing other invasive threats.
As the Southeast grapples with its invasive species, it's paramount to remember that proactive measures, research, community involvement, and education are the cornerstones of conservation. The success stories, challenges, and ongoing efforts in this region offer a comprehensive playbook for other areas confronting the menace of invasive species.
Could you post the dimension error?
Since the Inception model is quite deep, the auxiliary loss was used to stabilize the training.
If you are training from scratch, using the aux_loss might help.
imagenet based script process over the batch iteration for one epoch, then before second epoch it gave me that error. It tries to go the eval() for the first epoch to give the acc over that batch right?
This time, there is little confusion with the fc layer. I followed the finetune tutorial (just want to run with aux_logits=True): for inception as there is only one aux_logit below snippet working fine.
c80f0f1006