Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / multimedia

Bag-of-Features Descriptor on SIFT Features with OpenCV (BoF-SIFT)

4.96/5 (55 votes)
30 Aug 2017CPOL7 min read 628.2K   7.4K  
An implementation of Bag-Of-Feature descriptor based on SIFT features using OpenCV and C++ for content based image retrieval applications.

Introduction

Content based image retrieval (CBIR) is still an active research field. There are a number of approaches available to retrieve visual data from large databases. But almost all the approaches require an image digestion in their initial steps. Image digestion is describing an image using low level features such as color, shape, and texture while removing unimportant details. Color histograms, color moments, dominant color, scalable color, shape contour, shape region, homogeneous texture, texture browsing, and edge histogram are some of the popular descriptors that are used in CBIR applications. Bag-Of-Feature (BoF) is another kind of visual feature descriptor which can be used in CBIR applications. In order to obtain a BoF descriptor, we need to extract a feature from the image. This feature can be any thing such as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), and LBP (Local Binary Patterns), etc.

You can find a brief description of BoF, SIFT, and how to obtain BoF from SIFT features (BoF-SIFT) with the source code from this article. BoF-SIFT has been implemented using OpenCV 2.4 and Visual C++ (VS2008). But you can easily modify the code to work with any flavor of C++. You can write the same code yourself if you go through a few OpenCV tutorials.

If you are a developer of CBIR applications or a researcher of visual content analysis, you may use this code for your application or for comparing with your own visual descriptor. Further, you can modify this code to obtain other BoF descriptors such as BoF-SURF or BoF-LBP, etc.

Background

BoF and SIFT are totally independent algorithms. The next sections describe SIFT and then BoF.

SIFT - Scale Invariant Feature Transform

Point like features are very popular in many fields including 3D reconstruction and image registration. A good point feature should be invariant to geometrical transformation and illumination. A point feature can be a blob or a corner. SIFT is one of most popular feature extraction and description algorithms. It extracts blob like feature points and describe them with a scale, illumination, and rotational invariant descriptor.

Image 1

The above image shows how a SIFT point is described using a histogram of gradient magnitude and direction around the feature point. I'm not going to explain the whole SIFT algorithm in this article. But you can find the theoretical background of SIFT from Wikipedia or read David Lowe's original article regarding SIFT. I recommend to read this blog post for those with less interest in mathematics.

Unlike color histogram descriptor or LBP like descriptors, SIFT algorithm does not give an overall impression of the image. Instead, it detects blob like features from the image and describe each and every point with a descriptor that contains 128 numbers. As the output, it gives an array of point descriptors.

CBIR needs a global descriptor in order to match with visual data in a database or retrieve the semantic concept out of a visual content. We can use the array of point descriptors that yields from the SIFT algorithm for obtaining a global descriptor which gives an overall impression of visual data for CBIR applications. There are several methods available to obtain that global descriptor from SIFT feature point descriptors, and BoF is one general method that can be used to do the task.

Bag-Of-Feature (BoF) Descriptor

BoF is one of the popular visual descriptors used for visual data classification. BoF is inspired by a concept called Bag of Words that is used in document classification. A bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words of features is a sparse vector of occurrence counts of a vocabulary of local image features.

BoF typically involves in two main steps. First step is obtaining the set of bags of features. This step is actually an offline process. We can obtain set of bags for particular features and then use them for creating BoF descriptor. The second step is we cluster the set of given features into the set of bags that we created in first step and then create the histogram taking the bags as the bins. This histogram can be used to classify the image or video frame.

Bag-of_Features with SIFT

Let's see how can we build BoF with SIFT features.

  1. Obtain the set of bags of features.
    1. Select a large set of images.
    2. Extract the SIFT feature points of all the images in the set and obtain the SIFT descriptor for each feature point that is extracted from each image.
    3. Cluster the set of feature descriptors for the amount of bags we defined and train the bags with clustered feature descriptors (we can use the K-Means algorithm).
    4. Obtain the visual vocabulary.
  • Obtain the BoF descriptor for given image/video frame.
    1. Extract SIFT feature points of the given image.
    2. Obtain SIFT descriptor for each feature point.
    3. Match the feature descriptors with the vocabulary we created in the first step
    4. Build the histogram.

The following image shows the above two steps clearly. (The image has been taken from http://www.sccs.swarthmore.edu/users/09/btomasi1/tagging-products.html)

Image 2

Using the Code

With OpenCV, we can implement BoF-SIFT with just a few lines of code. Make sure that you have installed OpenCV 2.3 or higher version and Visual Studio 2008 or higher. The OpenCV version requirement is a must but still you may use other C++ flavors without any problems.

The code has two separate regions that are compiled and run independently. The first region is for obtaining the set of bags of features and the other region for obtaining the BoF descriptor for a given image/video frame. You need to run the first region of the code only once. After creating the vocabulary, you can use it with the second region of code any time. Modifying the code line below can switch between the two regions of code.

C++
#define DICTIONARY_BUILD 1 // set DICTIONARY_BUILD to 1 for Step 1. 0 for step 2 

Setting the DICTIONARY_BUILD constant to 1 will activate the following code region.

C++
#if DICTIONARY_BUILD == 1
 
//Step 1 - Obtain the set of bags of features.

//to store the input file names
char * filename = new char[100];        
//to store the current input image
Mat input;    

//To store the keypoints that will be extracted by SIFT
vector<KeyPoint> keypoints;
//To store the SIFT descriptor of current image
Mat descriptor;
//To store all the descriptors that are extracted from all the images.
Mat featuresUnclustered;
//The SIFT feature extractor and descriptor
SiftDescriptorExtractor detector;    

//I select 20 (1000/50) images from 1000 images to extract
//feature descriptors and build the vocabulary
for(int f=0;f<999;f+=50){        
    //create the file name of an image
    sprintf(filename,"G:\\testimages\\image\\%i.jpg",f);
    //open the file
    input = imread(filename, CV_LOAD_IMAGE_GRAYSCALE); //Load as grayscale                
    //detect feature points
    detector.detect(input, keypoints);
    //compute the descriptors for each keypoint
    detector.compute(input, keypoints,descriptor);        
    //put the all feature descriptors in a single Mat object 
    featuresUnclustered.push_back(descriptor);        
    //print the percentage
    printf("%i percent done\n",f/10);
} 

//Construct BOWKMeansTrainer
//the number of bags
int dictionarySize=200;
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
int flags=KMEANS_PP_CENTERS;
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
Mat dictionary=bowTrainer.cluster(featuresUnclustered);    
//store the vocabulary
FileStorage fs("dictionary.yml", FileStorage::WRITE);
fs << "vocabulary" << dictionary;
fs.release();

You can find what each line of code does by going through the comments above the line. As a summary, this part of code simply reads a set of images from my hard disk, extracts SIFT feature and descriptors, concatenates them, clusters them to a number of bags (dictionarySize), and then produces a vocabulary by training the bags with the clustered feature descriptors. You can modify the path to the images and give your own set of images to build the vocabulary.

After running this code, you can see a file called dictionary.yml in your project directory. I suggest you open it with Notepad and see how the vocabulary appears. It may not make any sense for you. But you can get an idea about the structure of the file which will be important if you work with OpenCV in future,

If you run this code successfully, then you can activate the next section by setting DICTIONARY_BUILD to 0. Here onwards, we don't need the first part of the code since we already obtained a vocabulary and saved it in a file.

The following part is the next code section which achieves the second step.

C++
#else
    //Step 2 - Obtain the BoF descriptor for given image/video frame. 

    //prepare BOW descriptor extractor from the dictionary    
    Mat dictionary; 
    FileStorage fs("dictionary.yml", FileStorage::READ);
    fs["vocabulary"] >> dictionary;
    fs.release();    
    
    //create a nearest neighbor matcher
    Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
    //create Sift feature point extracter
    Ptr<FeatureDetector> detector(new SiftFeatureDetector());
    //create Sift descriptor extractor
    Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);    
    //create BoF (or BoW) descriptor extractor
    BOWImgDescriptorExtractor bowDE(extractor,matcher);
    //Set the dictionary with the vocabulary we created in the first step
    bowDE.setVocabulary(dictionary);
 
    //To store the image file name
    char * filename = new char[100];
    //To store the image tag name - only for save the descriptor in a file
    char * imageTag = new char[10];
 
    //open the file to write the resultant descriptor
    FileStorage fs1("descriptor.yml", FileStorage::WRITE);    
    
    //the image file with the location. change it according to your image file location
    sprintf(filename,"G:\\testimages\\image\\1.jpg");        
    //read the image
    Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);        
    //To store the keypoints that will be extracted by SIFT
    vector<KeyPoint> keypoints;        
    //Detect SIFT keypoints (or feature points)
    detector->detect(img,keypoints);
    //To store the BoW (or BoF) representation of the image
    Mat bowDescriptor;        
    //extract BoW (or BoF) descriptor from given image
    bowDE.compute(img,keypoints,bowDescriptor);
 
    //prepare the yml (some what similar to xml) file
    sprintf(imageTag,"img1");            
    //write the new BoF descriptor to the file
    fs1 << imageTag << bowDescriptor;        
 
    //You may use this descriptor for classifying the image.
            
    //release the file storage
    fs1.release();
#endif   

In this section, SIFT features and descriptors are calculated for a particular image and we match each and every feature descriptor with the vocabulary we created before.

C++
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher); 

This line of code will create a matcher that matches the descriptor with a Fast Library for Approximate Nearest Neighbors (FLANN). There are some other types of matchers available so you can explore them yourself. In general, an approximate nearest neighbor matching works well.

Finally, the code outputs the Bag Of Feature descriptor and saves in a file with the following code line.

C++
fs1 << imageTag << bowDescriptor;

This descriptor can be used to classify the image for several classes. You may use SVM or any other classifier to check the discriminative power and the robustness of this descriptor. On the other hand, you can directly match BoF descriptors to different images in order to measure similarity.

Points of Interest

I found that this code can easily be converted into a BoF implementation of any other feature such as BoF-SURF, BoF-ORB, BoF-Opponent-SURF and BoF-Opponent-SIFT, etc.

You can find C++ and OpenCV source codes of implementations of both BoF-SURF, BoF-ORB in the following link:

Changing the lines below can get the BoF descriptor with any other type of feature.

C++
SiftDescriptorExtractor detector;
Ptr<FeatureDetector> detector(new SiftFeatureDetector());
Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);

The latest versions of OpenCV include many feature detection and description algorithms so you can apply those algorithms modifying this code and determine the best method for your CBIR application or research.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)