Whether you are tagging and categorizing your personal images, searching for stock photos for your company website, or simply trying to find the right image for your next epic blog post, trying to use text and keywords to describe something that is inherently visual is a real pain.
Perhaps by luck, I stumbled across one of the beach photographs. It was a beautiful, almost surreal beach shot. Puffy white clouds in the sky. Crystal clear ocean water, lapping at the golden sands. You could literally feel the breeze on your skin and smell the ocean air.
I then took the sole beach image that I found and then submitted it to my image search engine. Within seconds I had found all of the other beach photos, all without labeling or tagging a single image.
Searching by meta-data is only marginally different than your standard keyword-based search engines mentioned above. Search by meta-data systems rarely examine the contents of the image itself. Instead, they rely on textual clues such as (1) manual annotations and tagging performed by humans along with (2) automated contextual hints, such as the text that appears near the image on a webpage.
A great example of a Search by Meta-Data image search engine is Flickr. After uploading an image to Flickr you are presented with a text field to enter tags describing the contents of images you have uploaded. Flickr then takes these keywords, indexes them, and utilizes them to find and recommend other relevant images.
A great example of a Search by Example system is TinEye. TinEye is actually a reverse image search engine where you provide a query image, and then TinEye returns near-identical matches of the same image, along with the webpage that the original image appeared on.
Take a look at the example image at the top of this section. Here I have uploaded an image of the Google logo. TinEye has examined the contents of the image and returned to me the 13,000+ webpages that the Google logo appears on after searching through an index of over 6 billion images.
On Twitter you can upload photos to accompany your tweets. A hybrid approach would be to correlate the features extracted from the image with the text of the tweet. Using this approach you could build an image search engine that could take both contextual hints along with a Search by Example strategy.
When building an image search engine we will first have to index our dataset. Indexing a dataset is the process of quantifying our dataset by utilizing an image descriptor to extract features from each image.
Given two feature vectors, a distance function is used to determine how similar the two feature vectors are. The output of the distance function is a single floating point value used to represent the similarity between the two images.
This dataset consists of various vacation trips from all over the world, including photos of the Egyptian pyramids, underwater diving with sea-life, forests in the mountains, wine bottles and plates of food at dinner, boating excursions, and sunsets across the ocean.
However, while RGB values are simple to understand, the RGB color space fails to mimic how humans perceive color. Instead, we are going to use the HSV color space which maps pixel intensities into a cylinder:
So now that we have selected a color space, we now need to define the number of bins for our histogram. Histograms are used to give a (rough) sense of the density of pixel intensities in an image. Essentially, our histogram will estimate the probability density of the underlying function, or in this case, the probability P of a pixel color C occurring in our image I.
Personally, I like an iterative, experimental approach to tuning the number of bins. This iterative approach is normally based on the size of my dataset. The more smaller that my dataset is, the less bins I use. And if my dataset is large, I use more bins, making my histograms larger and more discriminative.
This means that for every image in our dataset, no matter if the image is 36 x 36 pixels or 2000 x 1800 pixels, all images will be abstractly represented and quantified using only a list of 288 floating point numbers.
I think the best way to explain a 3D histogram is to use the conjunctive AND. A 3D HSV color descriptor will ask a given image how many pixels have a Hue value that fall into bin #1 AND how many pixels have a Saturation value that fall into bin #1 AND how many pixels have a Value intensity that fall into bin #1. The number of pixels that meet these requirements are then tabulated. This process is repeated for each combination of bins; however, we are able to do it in an extremely computationally efficient manner.
Remember, our goal is to describe each of these segments individually. The most efficient way of representing each of these segments is to use a mask. Only (x, y)-coordinates in the image that has a corresponding (x, y) location in the mask with a white (255) pixel value will be included in the histogram calculation. If the pixel value for an (x, y)-coordinate in the mask has a value of black (0), it will be ignored.
So now for each of our segments we make a call to the histogram method on Line 42, extract the color histogram by using the image we want to extract features from as the first argument and the mask representing the region we want to describe as the second argument.
This list of numbers, or feature vector contains representations for each of the 5 image regions we described in Step 1. Each section is represented by a histogram with 8 x 12 x 3 = 288 entries. Given 5 entries, our overall feature vector is 5 x 288 = 1440 dimensionality. Thus each image is quantified and represented using 1,440 numbers.
Finally, we initialize our results dictionary on Line 12. A dictionary is a good data-type in this situation as it will allow us to use the (unique) imageID for a given image as the key and the similarity to the query as the value.
Images that have a chi-squared similarity of 0 will be deemed to be identical to each other. As the chi-squared similarity value increases, the images are considered to be less similar to each other.
Similarly, if we are building an image search engine for our vacation photos and we submit an image of a sailboat on a blue ocean and white puffy clouds, we would expect to get similar ocean view images back from our image search engine.
Be sure to pay close attention to our query image image. Notice how the sky is a brilliant shade of blue in the upper regions of the image. And notice how we have brown and tan desert and buildings at the bottom and center of the image.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
We utilized a color histogram to characterize the color distribution of our photos. Then, we indexed our dataset using our color descriptor, extracting color histograms from each of the images in the dataset.
PyImageSearch University is really the best Computer Visions "Masters" Degree that I wish I had when starting out. Being able to access all of Adrian's tutorials in a single indexed page and being able to start playing around with the code without going through the nightmare of setting up everything is just amazing. 10/10 would recommend.
Sword Coast Stratagems (SCS ) adds about 90 optional components to Baldur's Gate: Enhanced Edition (including Siege of Dragonspear), Baldur's Gate II: Enhanced Edition, Baldur's Gate Trilogy, Icewind Dale: Enhanced Edition, Baldur's Gate II: Throne of Bhaal, Baldur's Gate Trilogy, and Baldur's Gate 1 TUTU, mostly focused around improving monster AI and encounter difficulties, but also including a number of cosmetic and ease-of-use components, tweaks to abilities, new and modified spells, and a full implementation of the Icewind Dale spell system in Baldur's Gate. On the lowest difficulty settings, SCS mildly improves the intelligence and immersiveness of the game's enemy AI (and makes full use of the Icewind Dale spell system or the Spell Revisions spell system, if you have them installed). At higher difficulty settings, you should notice enemies behaving much more intelligently and realistically.
Sword Coast Stratagems is a collection of mini-mods for Baldur's Gate series of games. It is compatible with the Enhanced Editions of Baldur's Gate I and II, the original edition of Baldur's Gate II with the Throne of Bhaal extension, with the "EasyTUTU" conversion of Baldur's Gate to the Baldur's Gate II engine, and with the 'Baldur's Gate Trilogy' and 'Enhanced Edition Trilogy' combinations of the two games. As of version 35, it also has (experimental, limited) support for Icewind Dale: Enhanced Edition. It is not compatible with the original BG1, or with the original BG2 without Throne of Bhaal, or with the original Icewind Dale.
Sword Coast Stratagems contains many optional tweaks to various aspects of gameplay, but at its core it is a tactical/AI mod. It carries out a fairly systematic restructing of the games' core artificial intelligence, affecting spell and proficiency choices, kits, high-level abilities, and most importantly enemy AI scripting, with a full install rewriting the AI of practically every creature in the game, though in most cases without significantly affecting their core abilities. It further offers a large number of tailored components that modify individual set-piece encounters at various points throughout the two games. The idea throughout is to make combat more challenging, interesting, and most importantly immersive and realistic, insofar as that can be done with a 1990s-vintage game engine - I focus on trying to get enemies to use their abilities intelligently and effectively, and in the harder parts on making moderate changes to their powers and spells, rather than the brute-force method of increasing hit-rolls, saving throws and the like.
795a8134c1