In a previous blog article, we outlined several deep learning techniques for OCR. We mentioned how these techniques are used for text detection and text recognition which are the two primary building blocks of an OCR system.

In this article, we will go deeper into how deep learning is being used for text detection, which is the first block when doing OCR.

We are planning to release more articles where we will explore more deep learning techniques for text detection and text recognition.

Object detection for text detection

Object detection is a field in deep learning that is applied in several computer vision tasks, including text detection. There are mainly 2 types of object detection models: one-stage detectors and two-stages detectors.

1. One-stage object detection

Some famous examples of one-stage detectors are SSD (Single Shot Multibox Detectors) and YOLO (You Only Look Once).

These object detection models are very similar in the way they work. They take an image as input, then this image is passed through a set of convolutional layers. These layers are usually part of a larger network that was pre-trained on a large dataset of images.

After each convolutional layer, we get a set of feature maps, which will then be used to identify objects’ regions and classes. This identification process can be different from one network to another. In SSD for example, we use the output feature maps from each convolutional layer to try to identify objects’ regions and classes. But in YOLO for example, only the last layers of the network are used to try and identify objects in the image, though all the network’s layers are contributing to the process by learning relevant features about the dataset.