To day, the computer vision makes the computer more intelligent. It can recognize any simple object like a car, a building, a cup, etc. In this post, we want to explain how to make the computer recognizes a box. In general, for object detection, we can not use a method simple such as color based, shape based, etc. This project use featured based method for detect object. In particular, we use the Scale-invariant feature transform (SIFT) method for detect the box. Before talk about the method, we show you a video demonstration of the object detection automatic. Next part is the preparation of Hardware & Software. After that we talk about the implementation and explanation of the algorithm, compile and run the code. In other mind, you can have a look for the application of object detection for robotics. The article is closed by our conclusion.
Video demo for object detection
As you see in the video, the algorithm can detect the box even we move it anywhere or any direction. That is the advantage of the method SIFT. In this video, you can see on left – top corner the reference image that consider the object. In computer vision domain, we call the training image. The objective of this project is to find this object on real environment on front view of the camera. On every frame, when the object (training image) is founded, a rectangle will be drawn on the video at the object position.
Preparation of Hardware & Software
- 1 USB Camera
- Ubuntu 14.04
- OpenCV 2.4.11
- C++ library, make, etc
Implementation and explanation
The method SIFT or like SIFT using for object matching normally is divided by 3 important steps:
- Key points detection (for both training image and the image we need to search the object on)
- Descriptor extraction (for both training image and the image we need to search the object on)
- Descriptor matching
The descriptor matching step helps our algorithm to find the object. After the descriptors of 2 images is extracted, we compare each descriptor of training image on other image with all descriptors of real image. If 2 descriptors are the same (or almost the same), we consider that is a matching pairs.
In this sample, we use SIFT method for extract features. But normally for object detection, you can also use other featured based method such as SURF, BRIEF, etc. For understand more detail about this method, you can consult the SIFT method official page: http://www.cs.ubc.ca/~lowe/keypoints/.
For do this project, we need prepare an image called training image by capture the object that you want to detect (the image on left – top corner on the video is an example). The algorithm will find this object.
Detect key points of image training
int minHessian = 400; SiftFeatureDetector detector( minHessian ); std::vector<KeyPoint> keypoints_object; detector.detect( img_object, keypoints_object );
The above code helps to definite some necessary variants and detect key points. A key point is a feature, it contains the position on image, scale, angle and the descriptor. But after this code, we can detect only the position, scale and angle. For extract the descriptor, the method continues with next step.
Extract descriptor of image training
//-- Step 2: Calculate descriptors (feature vectors) SiftDescriptorExtractor extractor; Mat descriptors_object; extractor.compute( img_object, keypoints_object, descriptors_object );
Until here the algorithm could extract all features from training image. With the image for find the object on (each frame), we make the same. Each frame of the video (in our project, we don’t use video, we use the stream camera), we get an image, use it for image detection and apply SIFT for detect key points, extract features on that image. SIFT is a big method of Mr. David G. Lowe, we cannot explain here. But we want to say with this method, the algorithm detects all interest points (key points). A key point is a point without changed when the image is scaled, moved, resized, etc. For that reason, our program always detect the box when we move it anywhere in any direction.
OK, all key points is extracted, now we look how the algorithm find the object on each frame image of our camera.
Matching descriptor vectors
FlannBasedMatcher matcher; std::vector< DMatch > matches; matcher.match( descriptors_object, descriptors_scene, matches );
The above code is for match key points of 2 images. The principal element of key point for matching is descriptor, we don’t use other element. Each SIFT descriptor contains 128 element (you need read the article of Dr. D.Lowe for the detail). You maybe use any method for the matching step. But OpenCV provide 2 method interesting: the Brute-Force matcher and FLANN Matcher. Each method has itself advantage and disadvantage. We suggest you to readit for understand and chose the great method for your case http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_matcher/py_matcher.html. In this project we use the FLANN matcher.
Filter the matching
In addition to 3 steps above, we could add some optional steps. In this project we add the step filter the matching pairs. This step helps us ignore wrong matching. In particular, we use the RANSAC algorithm. Don’t worry, that algorithm is also implemented in OpenCV. If you want something simple, you can use other method for filter the matching pairs such as shortest distance. We had used it and the result is not so bad. However, we use the RANSAC algorithm because of its advantages: https://en.wikipedia.org/wiki/Random_sample_consensus
FindhomoGraphy helps to find the transform between matched key points that is called Affine matrix.
Mat H = findHomography( obj, scene, CV_RANSAC );
Based on the Affine matrix and 4 corners of the object, we can calculate the position of each corners on the frame image by Affine transform. But for simple, OpenCV provide for us a function named perspectiveTransform that helps to calculate the mapped object in the scene (to map the points).
perspectiveTransform( obj_corners, scene_corners, H); //-- Draw lines between the corners (the mapped object in the scene - image_2 ) line( img_matches, scene_corners + Point2f( img_object.cols, 0), scene_corners + Point2f( img_object.cols, 0), Scalar(0, 255, 0), 4 ); line( img_matches, scene_corners + Point2f( img_object.cols, 0), scene_corners + Point2f( img_object.cols, 0), Scalar( 0, 255, 0), 4 ); line( img_matches, scene_corners + Point2f( img_object.cols, 0), scene_corners + Point2f( img_object.cols, 0), Scalar( 0, 255, 0), 4 ); line( img_matches, scene_corners + Point2f( img_object.cols, 0), scene_corners + Point2f( img_object.cols, 0), Scalar( 0, 255, 0), 4 );
When four corners of the object are founded on the frame image, we draw a rectangle for connect 4 corners. The part inside the rectangle is the object that we want to detect. The position of the object is the rectangle centre.
Usage of the code
For easier to start your project, you can have a look at our source code first. Then you can create your own code or modifies ours.
Check our Github
git clone https://github.com/Booppey/object_detection.git
Then open your terminal, navigate to the folder of the project and type:
g++ -o object_detection object_detection.cpp -L/home/user_name/opencv-2.4.11/build/install/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_features2d -lopencv_flann -lopencv_nonfree -lopencv_calib3d ./object_detection
That is commands I used in my case. Maybe you need to change something for running in your case. If you have any difficulty or issue, do not be shy to let us know by leave a comment.
Other applications of object detection
Video application this module for Parrot Bebop Drone object tracking:
In this video, the Parrot Bebop drone follows the box until the box is in centre of drone’s camera. This drone can turn, go up, down, forward and backward. When the box object is out of the screen camera, drone will turn around for fetch the box.
For this application, we used the algorithm that we explained as above, but we changed the architecture. We used the client – server architecture. The image processing algorithm part runs on the server side. The android phone and the robot provide the client side.
In addition to 2 applications as above, you can see our other application at
Feature based approach like SIFT, SURF, … are usually considered in the case the training image has strong features. The approach may also prove further useful if the match in the search image might be transformed in some fashion. In the case of the training image without strong features, we could apply other approach like: color based for image with simple color, template based may be effective with image without changing of the size, etc.