Digit sequence detection of SVHN dataset
Unlike single digit detection problem like MNIST, images in SVHN dataset have more than one digit in series.
In this project, we used MSER detection and modified pre-trained VGG16 convolutional neural network to predict the digit sequence given a random image in SVHN dataset. Besides VGG16, we also used non-pre-trained VGG16 and self-built a much smaller CNN as comparisons.
The pipeline of the prediction is as:
MSER detect blob regions → CNN find ROIs(regions of interest) → Combine overlapped/near ROIs → CNN Find Digits
For training purposes, we mark each image with five integers as the label. The first four integers represent digits in the image, with 0 as no digit for that position and 1-10 representing 1-9 and 0. The fifth integer represents whether the area has any digit, with 0 represent a non-digit area and 1 as a window with digit(s).
After training with SVHN dataset, we got the test set accuracies of both VGG16 and self-built model as below. As each label has five integers, we could come up with element-wise accuracy, evaluating accuracies digit by digit, and sample-wise accuracy, evaluating accuracies based on a complete image sample. A true sample-wise label means all five digits in the label are correctly predicted.
Overall test set accuracies for different models
Detailed training curves for pre-trained VGG16
It is clear that pre-trained VGG16 performs the best among the three in terms of prediction accuracy. We implement the model, and use MSER detector, to find and determine the digit sequences in a given image or video.
The model shows pretty good detection and recognition performance. Though for digit aligned in a vertical way, the model does not perform as well as above, since the training data are mostly horizontally aligned and width is larger than height.
Also, we tested the model with a video clip, with digits on a building wall as below
Comments
Post a Comment