Skip to main content

What minHessian, Octaves and Layers mean in SURF (Speeded-up Robust Feature)? QA

My previous article in this blog is about a discussion on measuring image similarities with BOF in a large database. It is an extracted part from a forum of an article posted in CodeProject "Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)". This article is also an extracted part from the commenting section of the same article in the code project. As I described in my previous article, many people who used visual features do not have a proper understanding over the feature extraction and description algorithms because of these algorithms contain a lot of mathematical procedures which are difficult to understand with an average mathematical knowledge. The question which is about to discuss in this article has proved the above said fact and also the fact may cause the users to limit the usage of such features in their studies and applications.
Lets begin the discussion.

Q. I just wanted to ask why the minHessian value is 400, the number of octaves is 4, and the number of octave layers is 2. What would be the effect if I change these values? I'm just starting to learn about this and it is quite confusing. Also, how do you determine how many bags there should be? Why did you choose 200 for your code? I'm trying to extract the SURF features for more than 50 images, cluster them so I only have 1 matrix for each image (did I understand it correctly?), and then use the data to train SVM using Weka.

A. First of all it will be really useful if you can read the original papers of SIFT by Lowe, and SURF.
For your first question, the SURF features are detected by thresholding the determinant of Hessian matrix of unit patches. In simple word, we first calculate the determinant of hessian for each and every patches in the image and then threshold it to find the robust feature points. the minHessian is the controller of this threshold, so if you increase it, you will get less amount of feature points and if you decrease it you will get more feature points. One of the most important property of a feature is its repeatability (the tendency of re-detection the same feature in another image of the same scene but with different angle of camera). If you set the threshold to a lower value then you will get lot of weak feature points which have less repeatability. If you over threshold it then there will not be enough features to describe the image. You also can keep 400 for minHessian as it give enough amount of feature points for natural images. In special cases such as in medical domain you need to fine tune this value by doing an experiment.

For the second question, an octave represents a series of filter response maps obtained by convolving the same input image with a filter of increasing size. Unlike the other algorithm, in SURF we don't need to rescale the image to detect features of different sizes but we can use filters with different sizes. If we say 4 octaves and 2 octave layers then it means,
first we filter the image with the size 9x9 and then 15x15 (this is the two octave layers of the first octave)
second we filter the image with the size 15x15 and then 27x27 (this is the two octave layers of the second octave)
third we filter the image with the size 27x27 and then 51x51 (this is the two octave layers of the third octave)
finally we filter the image with the size 51x51 and then 99x99 (this is the two octave layers of the fourth octave)

You can see in every octave the filter size is increased logarithmic scale.
9 + (6*1) = 15
15 + (6*2) = 27
27 + (6*4) = 51
51 + (6*8) = 99

the value 6 is chosen because it promises that the filter has a center and the size is uneven.

finally it selects features from 2X4 response maps.
Increasing the octave number will give you the ability to detect both smaller and larger sized features in the image. Increasing the number of octave layers give you the ability to detect features in many different sizes between the range of the smallest to the largest. For an example, assume that in your image there is a cat, an elephant, a human and a pig. The following table shows how we detect features with different values for the parameters.

Octaves | Octave Layers | Who is detected          
1       |              1| cat                      
2       |              1| cat, pig                 
1       |              2| cat, pig                 
2       |              2| cat, pig, human          
3       |              1| cat, pig, human          
3       |              2| cat, pig, human, Elephant

The bad effect is, more octaves increases the running time of the algorithm.

Number of bags should be determined based on an experiment. There is a publication that 200 of bags performed well. If you are doing a research then you have to find the best number of bags by assessing the retrieval performance with varying the number of bags.

For the third question, it will be easy if you push all the features to a one Mat object because you can directly use the openCV function to cluster them. Otherwise you have to manually cluster and find the cluster centers to count as the vocabulary.


  1. if the size of matrix can be resized, can we adjust the matrix size with other numbers like 2x2, 4x4, 8x8, 16x16, 32x32, 64x64 and 128x128 ?


Post a Comment

Popular posts from this blog

Sri Lanka Maps in Garmin GPS

Recently I received a Garmin GPS (nuvi 50) from my brother who is studying in China. The GPS looks fine but there are no Sri Lanka base maps installed in it. Then I tried to find a Sri Lanka road map that supports to the device. As I went through the articles I got to know that the format of the maps used in Garmin devices is a proprietary one. The map blocks are archived in to a single file which has the extension ".img" but not similar to DVD or floppy image file.

I found there are three methods to get Sri Lanka map to Garmin devices.

Download from the Garmin map resourcesDownload Sri Lanka maps from UMP (Unofficial Map Project)Download and convert maps from OpenStreetMap  (PS: I found this link of OpenStreetMap which seems to support routable maps and very easy to download maps of any country including Sri Lanka.
The first method is bit expensive and I don't think that it is worth to buy map from Garmin because they don't give enoug…

How to Send Executable (.exe, .ocx, .dll, .com, .bat) Files in Gmail Without Changing the File Extension?

Why Gmail doesn't like exe files? If you use gmail as your email service probably you should be getting frustrated with it when you want to send files with the extensions exe, ocx, dll, com or bat. These executable stands for some files which can be executed independently within a typical operating system and there is a huge probability to contain computer viruses or malware in these types of files. Since these kind of files can be executed independently any virus that the file carried will infect our computers very easily.

Although this is not a problem in other free email services like yahoo, as Google has grabbed a big part from the services which we use for our day to day cyber needs, we can't move in to another service just because of this problem.

What happen when we are trying to upload an executable file in Gmail? When we attach an executable file first it upload the whole file and check on several criteria such as file extension (whether it contains .exe, .dll etc) a…