Digit sequence detection of SVHN dataset

 Unlike single digit detection problem like MNIST, images in SVHN dataset have more than one digit in series.

SVHN Samples. Ref: http://ufldl.stanford.edu/housenumbers/

In this project, we used MSER detection and modified pre-trained VGG16 convolutional neural network to predict the digit sequence given a random image in SVHN dataset. Besides VGG16, we also used non-pre-trained VGG16 and self-built a much smaller CNN as comparisons.

VGG16 Architecture. Ref: https://neurohive.io/en/popular-networks/vgg16/ 

Self-built CNN 

The pipeline of the prediction is as:

MSER detect blob regions → CNN find ROIs(regions of interest) → Combine overlapped/near ROIs → CNN Find Digits

For training purposes, we mark each image with five integers as the label. The first four integers represent digits in the image, with 0 as no digit for that position and 1-10 representing 1-9 and 0. The fifth integer represents whether the area has any digit, with 0 represent a non-digit area and 1 as a window with digit(s).

After training with SVHN dataset, we got the test set accuracies of both VGG16 and self-built model as below. As each label has five integers, we could come up with element-wise accuracy, evaluating accuracies digit by digit, and sample-wise accuracy, evaluating accuracies based on a complete image sample. A true sample-wise label means all five digits in the label are correctly predicted.

Overall test set accuracies for different models

Detailed training curves for pre-trained VGG16


It is clear that pre-trained VGG16 performs the best among the three in terms of prediction accuracy. We implement the model, and use MSER detector, to find and determine the digit sequences in a given image or video.

The model shows pretty good detection and recognition performance. Though for digit aligned in a vertical way, the model does not perform as well as above, since the training data are mostly horizontally aligned and width is larger than height.

Width: Height ratio distribution of all images in SVHN dataset

Also, we tested the model with a video clip, with digits on a building wall as below

 

Comments

Popular posts from this blog

Lab Software Development 3 - Scanning Probe Microscope Controller

Traffic-Sign Recognition Project: Basis for Self-Driving Vehicle Sensing System