Main Content

Image Category Classification Using Bag of Features

This example shows how to use a bag of features approach for image category classification. This technique is also often referred to as bag of words. Visual image categorization is a process of assigning a category label to an image under test. Categories may contain images representing just about anything, for example, dogs, cats, trains, boats.

Load Data

Unzip a collection of images to use for this example.

unzip('MerchData.zip');

Load the image collection using an imageDatastore to help you manage the data. Because imageDatastore operates on image file locations, and therefore does not load all the images into memory, it is safe to use on large image collections.

imds = imageDatastore('MerchData','IncludeSubfolders',true,'LabelSource','foldernames');

You can easily inspect the number of images per category as well as category labels as shown below:

tbl = countEachLabel(imds)
tbl=5×2 table
             Label             Count
    _______________________    _____

    bat365 Cap               15  
    bat365 Cube              15  
    bat365 Playing Cards     15  
    bat365 Screwdriver       15  
    bat365 Torch             15  

Note that the labels were derived from directory names used to construct the ImageDatastore, but can be customized by manually setting the Labels property of the ImageDatastore object. Next, display a few of the images to get a sense of the type of images being used.

figure
montage(imds.Files(1:16:end))

Figure contains an axes object. The axes object contains an object of type image.

Note that for the bag of features approach to be effective, the majority of the object must be visible in the image.

Prepare Data for Training

Separate the sets into training and validation data. Pick 60% of images from each set for the training data and the remainder, 40%, for the validation data. Randomize the split to avoid biasing the results.

[trainingSet, validationSet] = splitEachLabel(imds, 0.6, 'randomize');

The above call returns two imageDatastore objects ready for training and validation tasks.

Create a Visual Vocabulary and Train an Image Category Classifier

Bag of words is a technique adapted to computer vision from the world of natural language processing. Since images do not actually contain discrete words, we first construct a "vocabulary" of extractFeatures features representative of each image category.

This is accomplished with a single call to bagOfFeatures function, which:

  1. extracts SURF features from all images in all image categories

  2. constructs the visual vocabulary by reducing the number of features through quantization of feature space using K-means clustering

bag = bagOfFeatures(trainingSet);
Creating Bag-Of-Features.
-------------------------
* Image category 1: bat365 Cap
* Image category 2: bat365 Cube
* Image category 3: bat365 Playing Cards
* Image category 4: bat365 Screwdriver
* Image category 5: bat365 Torch
* Selecting feature point locations using the Grid method.
* Extracting SURF features from the selected feature point locations.
** The GridStep is [8 8] and the BlockWidth is [32 64 96 128].

* Extracting features from 45 images...done. Extracted 141120 features.

* Keeping 80 percent of the strongest features from each category.

* Creating a 500 word visual vocabulary.
* Number of levels: 1
* Branching factor: 500
* Number of clustering steps: 1

* [Step 1/1] Clustering vocabulary level 1.
* Number of features          : 112895
* Number of clusters          : 500
* Initializing cluster centers...100.00%.
* Clustering...completed 15/100 iterations (~0.84 seconds/iteration)...converged in 15 iterations.

* Finished creating Bag-Of-Features

Additionally, the bagOfFeatures object provides an encode method for counting the visual word occurrences in an image. It produced a histogram that becomes a new and reduced representation of an image.

img = readimage(imds, 1);
featureVector = encode(bag, img);
Encoding images using Bag-Of-Features.
--------------------------------------
* Encoding an image...done.
% Plot the histogram of visual word occurrences
figure
bar(featureVector)
title('Visual word occurrences')
xlabel('Visual word index')
ylabel('Frequency of occurrence')

Figure contains an axes object. The axes object with title Visual word occurrences, xlabel Visual word index, ylabel Frequency of occurrence contains an object of type bar.

This histogram forms a basis for training a classifier and for the actual image classification. In essence, it encodes an image into a feature vector.

Encoded training images from each category are fed into a classifier training process invoked by the trainImageCategoryClassifier function. Note that this function relies on the multiclass linear SVM classifier from the Statistics and Machine Learning Toolbox™.

categoryClassifier = trainImageCategoryClassifier(trainingSet, bag);
Training an image category classifier for 5 categories.
--------------------------------------------------------
* Category 1: bat365 Cap
* Category 2: bat365 Cube
* Category 3: bat365 Playing Cards
* Category 4: bat365 Screwdriver
* Category 5: bat365 Torch

* Encoding features for 45 images...done.

* Finished training the category classifier. Use evaluate to test the classifier on a test set.

The above function utilizes the encode method of the input bag object to formulate feature vectors representing each image category from the trainingSet.

Evaluate Classifier

Now that we have a trained classifier, categoryClassifier, let's evaluate it. As a sanity check, let's first test it with the training set, which should produce near perfect confusion matrix, i.e. ones on the diagonal.

confMatrix = evaluate(categoryClassifier, trainingSet);
Evaluating image category classifier for 5 categories.
-------------------------------------------------------

* Category 1: bat365 Cap
* Category 2: bat365 Cube
* Category 3: bat365 Playing Cards
* Category 4: bat365 Screwdriver
* Category 5: bat365 Torch

* Evaluating 45 images...done.

* Finished evaluating all the test sets.

* The confusion matrix for this test set is:


                                                                          PREDICTED
KNOWN                      | bat365 Cap   bat365 Cube   bat365 Playing Cards   bat365 Screwdriver   bat365 Torch   
----------------------------------------------------------------------------------------------------------------------------------
bat365 Cap              | 1.00            0.00             0.00                      0.00                    0.00              
bat365 Cube             | 0.00            0.89             0.00                      0.00                    0.11              
bat365 Playing Cards    | 0.00            0.00             1.00                      0.00                    0.00              
bat365 Screwdriver      | 0.00            0.00             0.00                      1.00                    0.00              
bat365 Torch            | 0.00            0.00             0.00                      0.00                    1.00              

* Average Accuracy is 0.98.

Next, let's evaluate the classifier on the validationSet, which was not used during the training. By default, the evaluate function returns the confusion matrix, which is a good initial indicator of how well the classifier is performing.

confMatrix = evaluate(categoryClassifier, validationSet);
Evaluating image category classifier for 5 categories.
-------------------------------------------------------

* Category 1: bat365 Cap
* Category 2: bat365 Cube
* Category 3: bat365 Playing Cards
* Category 4: bat365 Screwdriver
* Category 5: bat365 Torch

* Evaluating 30 images...done.

* Finished evaluating all the test sets.

* The confusion matrix for this test set is:


                                                                          PREDICTED
KNOWN                      | bat365 Cap   bat365 Cube   bat365 Playing Cards   bat365 Screwdriver   bat365 Torch   
----------------------------------------------------------------------------------------------------------------------------------
bat365 Cap              | 1.00            0.00             0.00                      0.00                    0.00              
bat365 Cube             | 0.00            0.50             0.17                      0.17                    0.17              
bat365 Playing Cards    | 0.00            0.00             1.00                      0.00                    0.00              
bat365 Screwdriver      | 0.00            0.00             0.00                      1.00                    0.00              
bat365 Torch            | 0.17            0.00             0.00                      0.00                    0.83              

* Average Accuracy is 0.87.
% Compute average accuracy
mean(diag(confMatrix))
ans = 0.8667

You can tune bagOfFeatures hyperparameters and continue evaluating the trained classifier until you are satisfied with the results. Additional statistics can be derived using the rest of arguments returned by the evaluate function. See help for imageCategoryClassifier/evaluate.

Classify Object in Image

You can now apply the newly trained classifier to categorize new images.

img = imread(fullfile('MerchData','bat365 Cap','Hat_0.jpg'));
figure
imshow(img)

Figure contains an axes object. The axes object contains an object of type image.

[labelIdx, scores] = predict(categoryClassifier, img);
Encoding images using Bag-Of-Features.
--------------------------------------
* Encoding an image...done.
% Display the string label
categoryClassifier.Labels(labelIdx)
ans = 1x1 cell array
    {'bat365 Cap'}