Object detection in images is a key element of an extended monitoring system. The field that has been developed over the years allows for the recognition of specific objects in photographs and films. Among other things, it translates into increased road safety or analysis of the situation in real time. But what is the principle of image detection and what algorithms are used?
What is image Detection?
Let us start by explaining briefly what image detection actually means. It is nothing different than recognising an object and classifying it into a specific group or category. “Image” is understood as a quantitative description of an object, situation, phenomenon, signal or event.
The techniques of object detection are used on a daily basis in production lines, intersections, in autonomous vehicles or security systems. Every single image detection system must be as precise as possible to avoid making mistakes and classifying objects in contravention of their category. Such a mistake could translate into a hazard to health or even life, as we are dealing with self-controlled cars for example.
Object detection uses algorithms based on convolutional neural networks, consisting of numerous layers corresponding to colour, shape, size or texture. Adding deep learning mechanisms to the mix, based on the increasing amount of image processing, the algorithms are able to gradually increase their efficiency.
We can distinguish three main types of Computer Vision tasks:
- image classification – assigning an image to one label from a previously prepared set of categories,
- object localization – the algorithm indicates the exact location of an object in the image,
- object detection – detection of objects from the real world, such as faces, buildings, cars.
R-CNN Model Family
R-CNN is a family of algorithms which consists of base R-CNN algorithm and Fast R-CNN and Faster R-CNN. The abbreviation R-CNN means Regions with CNN (Convolutional Neural Network), i.e. regions with convolutional neural networks. Each subsequent algorithm is based on the previous one, so in the other names we find such terms as ‘Fast’ and ‘Faster’.
The R-CNN algorithm consists of three modules:
- Region Proposal,
- Feature Extractor,
The principle of the R-CNN algorithm is based on a regionally distributed grid overlay. It is placed on the input image, where it then recognizes objects. Two thousand regions are used to classify objects into predefined categories. R-CNN results are given as a percentage of mAP (mean Average Precision). R-CNN’s mean Average Precision is 53.7%.
The success of R-CNN indicated that it is worth improving and a fast algorithm was created. Fast R-CNN is able to achieve mAP efficiency of 68.4%.
The Faster R-CNN algorithm is designed to be even more efficient in less time. Its mAP amounts to 78.8%. Faster R-CNN consists of two modules:
- Region Proposal Network – proposing regions and object types,
- Fast R-CNN – distinguishing features from individual regions, establishing a border frame, assigning class labels.
YOLO Model Family
Another popular family of object recognition algorithms is YOLO. It stands for “You Only Look Once”. Although less accurate than R-CNN algorithms, YOLO is much faster in processing dynamic images in real time.
The YOLO algorithm works by overlaying the envelope of objects on an image in order to define objects and determine their compatibility. There are 24 stranded layers, each of which is assigned to a different activity. A simplified version with only 9 layers is also available, which speeds up image processing but reduces recognition efficiency.
The YOLO algorithm can be successfully used to operate in real time, e.g. to process surveillance recordings. The mAP ratio for YOLO is 63.4%.
The SSD (Single Shot Multibox Detector) algorithm is based on the structure used for deep learning – Caffe. It is partly based on the R-CNN family, but uses other techniques. It is fast and highly effective. SSD uses several envelopes to define objects of different types, which allows for quick recognition of small and large objects.
The efficiency of the algorithm in the case of SSD300 version reaches even 79.6% mAP, and for SSD512 as much as 81.6% mAP.