My previous article in this blog is about a discussion on measuring image similarities with BOF in a large database. It is an extracted part from a forum of an article posted in CodeProject "Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)". This article is also an extracted part from the commenting section of the same article in the code project. As I described in my previous article, many people who used visual features do not have a proper understanding over the feature extraction and description algorithms because of these algorithms contain a lot of mathematical procedures which are difficult to understand with an average mathematical knowledge. The question which is about to discuss in this article has proved the above said fact and also the fact may cause the users to limit the usage of such features in their studies and applications.

Lets begin the discussion.

For your

For the

first we filter the image with the size 9x9 and then 15x15 (this is the two octave layers of the first octave)

second we filter the image with the size 15x15 and then 27x27 (this is the two octave layers of the second octave)

third we filter the image with the size 27x27 and then 51x51 (this is the two octave layers of the third octave)

finally we filter the image with the size 51x51 and then 99x99 (this is the two octave layers of the fourth octave)

You can see in every octave the filter size is increased logarithmic scale.

9 + (6*1) = 15

15 + (6*2) = 27

27 + (6*4) = 51

51 + (6*8) = 99

the value 6 is chosen because it promises that the filter has a center and the size is uneven.

finally it selects features from 2X4 response maps.

Increasing the octave number will give you the ability to detect both smaller and larger sized features in the image. Increasing the number of octave layers give you the ability to detect features in many different sizes between the range of the smallest to the largest. For an example, assume that in your image there is a cat, an elephant, a human and a pig. The following table shows how we detect features with different values for the parameters.

The bad effect is, more octaves increases the running time of the algorithm.

Number of bags should be determined based on an experiment. There is a publication that 200 of bags performed well. If you are doing a research then you have to find the best number of bags by assessing the retrieval performance with varying the number of bags.

For the

Lets begin the discussion.

**Q.**I just wanted to ask why the minHessian value is 400, the number of octaves is 4, and the number of octave layers is 2. What would be the effect if I change these values? I'm just starting to learn about this and it is quite confusing. Also, how do you determine how many bags there should be? Why did you choose 200 for your code? I'm trying to extract the SURF features for more than 50 images, cluster them so I only have 1 matrix for each image (did I understand it correctly?), and then use the data to train SVM using Weka.**A.**First of all it will be really useful if you can read the original papers of SIFT by Lowe, and SURF.For your

**first**question, the SURF features are detected by thresholding the determinant of Hessian matrix of unit patches. In simple word, we first calculate the determinant of hessian for each and every patches in the image and then threshold it to find the robust feature points. the minHessian is the controller of this threshold, so if you increase it, you will get less amount of feature points and if you decrease it you will get more feature points. One of the most important property of a feature is its repeatability (the tendency of re-detection the same feature in another image of the same scene but with different angle of camera). If you set the threshold to a lower value then you will get lot of weak feature points which have less repeatability. If you over threshold it then there will not be enough features to describe the image. You also can keep 400 for minHessian as it give enough amount of feature points for natural images. In special cases such as in medical domain you need to fine tune this value by doing an experiment.For the

**second**question, an octave represents a series of filter response maps obtained by convolving the same input image with a filter of increasing size. Unlike the other algorithm, in SURF we don't need to rescale the image to detect features of different sizes but we can use filters with different sizes. If we say 4 octaves and 2 octave layers then it means,first we filter the image with the size 9x9 and then 15x15 (this is the two octave layers of the first octave)

second we filter the image with the size 15x15 and then 27x27 (this is the two octave layers of the second octave)

third we filter the image with the size 27x27 and then 51x51 (this is the two octave layers of the third octave)

finally we filter the image with the size 51x51 and then 99x99 (this is the two octave layers of the fourth octave)

You can see in every octave the filter size is increased logarithmic scale.

9 + (6*1) = 15

15 + (6*2) = 27

27 + (6*4) = 51

51 + (6*8) = 99

the value 6 is chosen because it promises that the filter has a center and the size is uneven.

finally it selects features from 2X4 response maps.

Increasing the octave number will give you the ability to detect both smaller and larger sized features in the image. Increasing the number of octave layers give you the ability to detect features in many different sizes between the range of the smallest to the largest. For an example, assume that in your image there is a cat, an elephant, a human and a pig. The following table shows how we detect features with different values for the parameters.

**Octaves | Octave Layers | Who is detected **
1 | 1| cat
2 | 1| cat, pig
1 | 2| cat, pig
2 | 2| cat, pig, human
3 | 1| cat, pig, human
3 | 2| cat, pig, human, Elephant

The bad effect is, more octaves increases the running time of the algorithm.

Number of bags should be determined based on an experiment. There is a publication that 200 of bags performed well. If you are doing a research then you have to find the best number of bags by assessing the retrieval performance with varying the number of bags.

For the

**third**question, it will be easy if you push all the features to a one Mat object because you can directly use the openCV function to cluster them. Otherwise you have to manually cluster and find the cluster centers to count as the vocabulary.
if the size of matrix can be resized, can we adjust the matrix size with other numbers like 2x2, 4x4, 8x8, 16x16, 32x32, 64x64 and 128x128 ?

ReplyDelete