Skip to main content

Measure Image Similarity with Bag of Feature (BOF) in a Large Database QA

Since my research is about visual data understanding, I thoroughly studied about many visual features and how these features can be described. It seemed that many people who used such features do not have a proper understanding over the feature extraction and description algorithms. These algorithms obviously contain a lot of mathematical procedures which are difficult to understand with an average knowledge about mathematics. The fact causes the users to limit the usage of such features in their studies and applications. 
I have written a few articles about the usage of some visual feature descriptors and other image processing techniques in any application, with the source-code hosted in CodeProject.

  1. Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)
  2. Bag-of-Features Descriptor on SURF and ORB Features (BoF-SURF and BoF-ORB)
  3. Image Segmentation using Unsupervised Watershed Algorithm with an Over-segmentation Reduction Technique
While answering the questions in the comments I felt that there are some subtopics in each of these articles which are needed to discuss separately. In this article I am trying to gather the important points from the article forum "Image similarity in very large database" attached to the original article "Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)". I will put them as questions and answer format.

Q. First, how big can the database be? can I create a BoF of, for example, 400,000 images, or would the program run off of memory?

A. In order to create BOF it is worthless to consider that much of images. But in case you used around 400000 images it may overflow the data structure inside the Mat object (not sure but that is the first thing which fill be failed based on the implementation and your system). If that does not fail, and there is not memory leak in the implementation of the functions within openCV, then there will be a great probability to create the BoF without any problem. But definitely you will have to wait with a cup of tea. 
One of the correct ways to handle such a large database is following the divide and conquer method. You can split the database to small parts and the obtain BoF. Further you can use a concept-hierarchy instead of a set of concepts in single level, so at a time you only have to classify an image to a small number of classes. 

Q. How can I use the descriptor taken from the input image to measure image similarity (I'm not sure how can I identify the image that matched from the BoF).

A. You can compare the BoF descriptors of two images in order to check the similarity, using a standard measurement such as Euclidean distance, Bhattacharyya distance etc. If you need to get a set of similar images from a database then you can use nearest neighbor algorithm with one of above said distance measurements. In some applications people only need to understand what is in the image. This kind of problems is called as visual semantic retrieval. In other words we can extract the meaning or the concept inside the image. In that case you can use an SVM to train several concepts with BOF descriptors of images, so the SVM will be able to classify a BoF descriptor of any given image to one of the trained concept.

Q. So, using BoF to consult large databases in real time is not a good approach if I want to get a good performance (1-2 seconds for a 400,000 images database)? 
What do you recommend in this case?

A. Let say we have 2 seconds to do the job, so each image will get 0.005 milliseconds if we do linear search. This is not possible with conventional hardware setup. But what if we create BOF descriptors offline and do the matching online? This is one of the approaches people use when they need to retrieve images without considering the semantics. Here are the steps,
  1. Create the BoF codebook using a sample of images from the database.
  2. Create BoF descriptor of each image and save in a database with referencing to the image.
  3. When the query image is given, extract the bof descriptor and do the matching with descriptors in database. Matching is simply comparing the descriptors. A descriptor is a matrix in simple word. Matrix calculation can be done in parallel with modern computers (search for SIMD). If you really need to further reduce the time you can use a GPU (May be with CUDA capability).
As an alternative solution, do not match image to image. Match what is inside the image. Say, one image contain waterfalls, so you can index the image as waterfall. Once the query image is given you can extract what inside the image and the simply do string search (or simple database search with indexing) which should bring the result within a few milliseconds in a high end computer. 

Q. How many images can a sample contain? I guess it depends on how accurate I want the matching, but I'm not sure. And just as a doubt: Using the BoF of the sample I took to create a BoF descriptor of an image that is not in the sample, would the matching with another image be good? Or not that accurate?

A. You are correct but the accuracy and the number of images in the sample is not closely related. Some times it is better to use less number of images while in most of the cases it is better to use somewhat larger sample. Since the SIFT descriptor contains 128 elements there can be infinite number of SIFT descriptors within the finite vector space. Hence more you give it adjusts the cluster centers fair enough to all the descriptors. But since we use a codebook with an extremely lower size compare to the infinite number of descriptors, a large sample may mislead the codebook.
I guess you are asking that what will happen if we use an image which was not in the sample set which we used to create the BOF codebook. So if this is your question, even for completely new image, the codebook can create a fair BoF histogram. Therefore images with somewhat similar distribution of SIFT feature vectors will be matched in most cases. But as always some times (rarely) you can see the matched image may be completely different to the query image. Because SIFT feature vector is not exactly what we human see in an image. SIFT feature is simply a histogram of gradient magnitudes on the 8 gradient angle bins. We human see colors, shapes, textures and if we do matching, then we do it semantically, not by low level features. 
So you can expect good accuracy with this but for sure it is not even near perfect.

Please read my original article in codeproject and your comments are welcome.


  1. Hi Ravimal, I'm not able to figure out how to find similar/same images from database given a query image once we have constructed the vocabulary using the BoF descriptors and a way to rank the results based on the similarity. Can you help?


Post a Comment

Popular posts from this blog

Sri Lanka Maps in Garmin GPS

Recently I received a Garmin GPS (nuvi 50) from my brother who is studying in China. The GPS looks fine but there are no Sri Lanka base maps installed in it. Then I tried to find a Sri Lanka road map that supports to the device. As I went through the articles I got to know that the format of the maps used in Garmin devices is a proprietary one. The map blocks are archived in to a single file which has the extension ".img" but not similar to DVD or floppy image file.

I found there are three methods to get Sri Lanka map to Garmin devices.

Download from the Garmin map resourcesDownload Sri Lanka maps from UMP (Unofficial Map Project)Download and convert maps from OpenStreetMap  (PS: I found this link of OpenStreetMap which seems to support routable maps and very easy to download maps of any country including Sri Lanka.
The first method is bit expensive and I don't think that it is worth to buy map from Garmin because they don't give enoug…

How to Send Executable (.exe, .ocx, .dll, .com, .bat) Files in Gmail Without Changing the File Extension?

Why Gmail doesn't like exe files? If you use gmail as your email service probably you should be getting frustrated with it when you want to send files with the extensions exe, ocx, dll, com or bat. These executable stands for some files which can be executed independently within a typical operating system and there is a huge probability to contain computer viruses or malware in these types of files. Since these kind of files can be executed independently any virus that the file carried will infect our computers very easily.

Although this is not a problem in other free email services like yahoo, as Google has grabbed a big part from the services which we use for our day to day cyber needs, we can't move in to another service just because of this problem.

What happen when we are trying to upload an executable file in Gmail? When we attach an executable file first it upload the whole file and check on several criteria such as file extension (whether it contains .exe, .dll etc) a…

What minHessian, Octaves and Layers mean in SURF (Speeded-up Robust Feature)? QA

My previous article in this blog is about a discussion on measuring image similarities with BOF in a large database. It is an extracted part from a forum of an article posted in CodeProject "Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)". This article is also an extracted part from the commenting section of the same article in the code project. As I described in my previous article, many people who used visual features do not have a proper understanding over the feature extraction and description algorithms because of these algorithms contain a lot of mathematical procedures which are difficult to understand with an average mathematical knowledge. The question which is about to discuss in this article has proved the above said fact and also the fact may cause the users to limit the usage of such features in their studies and applications.
Lets begin the discussion.

Q. I just wanted to ask why the minHessian value is 400, the number of octaves is 4, and th…