Binary Robust Independent Elementary Features

Author: Sumit Lakhera

BRIEF is a general-purpose feature point descriptor that can be combined with arbitrary detectors. It is robust to typical classes of photometric and geometric image transformations. BRIEF is targeting real-time applications leaving them with a large portion of the available CPU power for subsequent tasks but also allows running feature point matching algorithms on computationally weak devices such as mobile phones. It is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done. As a result, BRIEF is very fast both to build and to match. We can compare it against SURF and U-SURF on standard benchmarks and see that it yields a similar or better recognition performance, while running in a fraction of the time required by either. It yields a similar or better recognition performance, while running in a fraction of the time required by either.

Introduction

Feature point descriptors are now at the core of many Computer Vision technologies, such as object recognition, 3D reconstruction, image retrieval, and camera localization. Since applications of these technologies have to handle ever more data or to run on mobile devices with limited computational resources, there is a growing need for local descriptors that are fast to compute, fast to match, and memory eﬃcient.

Fig - 1

In this article, I have used binary strings as an eﬃcient feature point descriptor called BRIEF. BRIEF is very fast both to build and to match. BRIEF easily outperforms other fast descriptors such as SURF and SIFT in terms of speed and terms of recognition rate in many cases.

Theory

SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a vector for thousands of features takes a lot of memory which are not feasible for resource-constraint applications especially for embedded systems. Larger the memory, longer the time it takes for matching.

But all these dimensions may not be needed for actual matching. We can compress it using several methods like PCA, LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is used to convert these SIFT descriptors in floating point numbers to binary strings. These binary strings are used to match features using Hamming distance. This provides better speed-up because finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing, which doesn't solve our initial problem on memory.

BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly without finding descriptors. It takes smoothened image patch and selects a set of nd (x, y) location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done on these location pairs. For eg, let first location pairs be p and q. If I(p)<I(q), then its result is 1, else it is 0. This is applied for all the nd location pairs to get a nd-dimensional bit string.

This nd can be 128, 256 or 512. OpenCV supports all of these, but by default, it would be 256 (OpenCV represents it in bytes. So the values will be 16, 32 and 64). So once you get this, you can use Hamming Distance to match these descriptors.

One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE points than for SURF points. In short, BRIEF is a faster method feature descriptor calculation and matching. It also provides high recognition rate unless there is large in-plane rotation.

STAR (CenSurE) in OpenCV

STAR is a feature detector derived from CenSurE. Unlike CenSurE however, which uses polygons like squares, hexagons and octagons to approach a circle, Star emulates a circle with 2 overlapping squares: one upright and one 45-degree rotated. These polygons are bi-level. They can be seen as polygons with thick borders. The borders and the enclosed area have weights of opposing signs. This has better computational characteristics than other scale-space detectors and it is capable of real-time implementation. In contrast to SIFT and SURF, which find extrema at sub-sampled pixels that compromises accuracy at larger scales, CenSurE creates a feature vector using full spatial resolution at all scales in the pyramid.

Background

After detecting key points we go on to compute a descriptor for every one of them. Feature descriptors encode interesting information into a series of numbers and act as a sort of numerical “fingerprint” that can be used to differentiate one feature from another. The defined neighborhood around pixel (key point) is known as a patch, which is a square of some pixel width and height.

Fig - 2

Green Square in the patch is key point

Image patches could be eﬀectively classiﬁed on the basis of a relatively small number of pair-wise intensity comparisons. Brief convert image patches into a binary feature vector so that together they can represent an object. Binary features vector also know as binary feature descriptor is a feature vector that only contains 1 and 0. In brief, each key point is described by a feature vector which is 128–512 bits string.

Binary features vector :

V1= [01011100100110…

V2= [10010100110100…

V3= [11000100101110…

V4= [01011111100100…

Smoothing Kernels

Brief deals with the image at pixel level so it is very noise-sensitive. By pre-smoothing the patch, this sensitivity can be reduced, thus increasing the stability and repeatability of the descriptors. It is for the same reason that images need to be smoothed before they can be meaningfully diﬀerentiated when looking for edges.

Fig - 3

Brief uses Gaussian kernel for smoothing image. In the figure below we comparison of Gaussian smoothing on the recognition rates for variances of Gaussian kernel ranging from 0 to 3. The more diﬃcult the matching, the more important smoothing becomes to achieving good performance. The recognition rates remain relatively constant in the 1 to 3 range and, in practice, we use a value of 2.

Fig - 4

Smoothed Image to Binary Feature Vector

Now we have smoothed image patch, p. Now our target is to create a binary feature vector out of this patch. We simply create a binary feature vector of the binary test(τ) responses. A binary test τ is defined by:

Where T(p; x, y) is defined as : {1 : p(x) < p(y)}

{0 : p(x) >= p(y)}

Where p(x) is the intensity of p at a point x. Choosing a set of n(x, y)-location pairs uniquely deﬁnes a set of binary tests. Where n is the length of the binary feature vector and it could be 128, 256, and 512.

How to select (x, y) pairs?

Generating a length n bit vector leaves many options for selecting the test locations((x, y) pair). The (x, y) pair is also called random pair which is located inside the patch. Total we have to select n test(random pair) for creating a binary feature vector and we have to choose this n test from one of five approaches(Sampling Geometries) given below.

Fig - 5

Diﬀerent approaches to choosing the test locations

Let consider the size of patch p is (S x S) and assuming keypoint is located in the center of the patch.

Uniform(G I): Both x and y pixels in the random pair is drawn from a Uni from distribution or spread of S/2 around keypoint. The pair(test) can lie close to the patch border.

Gaussian(G II): Both x and y pixels in the random pair is drawn from a Gaussian distribution or spread of 0.04 * S² around keypoint.

Gaussian(G III): The first pixel(x) in the random pair is drawn from a Gaussian distribution centered around the keypoint with a stranded deviation or spread of 0.04 * S². The second pixel(y) in the random pair is drawn from a Gaussian distribution centered around the first pixel(x) with a standard deviation or spread of 0.01 * S². This forces the test(pair) to be more local. Test(pair) locations outside the patch are clamped to the edge of the patch.

Coarse Polar Grid(G IV): Both x and y pixels in the random pair is sampled from discrete locations of a coarse polar grid introducing a spatial quantization.

Coarse Polar Grid(G V): The first pixel(x) in random pair is at (0, 0) and the second pixel(y) in the random pair is drawn from discrete locations of a coarse polar grid.

Fig - 6

The recognition rate for the ﬁve diﬀerent test geometries

Finally, our BRIEF descriptor look like:

Fig - 7

Conclusion

Brief relies on a relatively small number of intensity diﬀerence tests to represent an image patch as a binary string. Not only is construction and matching for this descriptor much faster than for other state-of-the-art ones, but it also tends to yield higher recognition rates, as long as invariance to large in-plane rotations is not a requirement.

GitHub Link :

https://github.com/Sumit-321/My-work-samples/blob/main/BRIEF.ipynb

Madras Scientific Research Foundation