Home > ARM, Guides > Color object tracking with STM32 + OV7725

Color object tracking with STM32 + OV7725

Abstract
DEVELOPMENT OF AN EMBEDDED SYSTEM FOR TARGETING A COLOR OBJECT USING A VIDEO CAMERA INTEGRATED TO A MICROCONTROLLER

This project uses STM32F103 microcontroller to track an object, it gets the image from an OV7725 camera + FIFO, it is configured as rgb565 QVGA(320×240).
In the touchscreen the target object can be selected, its color defines the thereshold to binarize an image. After the segmentation is done an algorithm recognizes the contour of the image and its center, once located a PI controller moves 2 servos (pan, tilt) in order to target the objective.
A video of the system doing real-time tracking can be seen in the bottom of the post. The source code and Keil project for the STM32F103VCT device can be downloaded here: Image_Processing.zip
Front

Objective
Designing an embedded system in a microprocessor for detection and targeting a colored object, without the need for externally processing system (PC)


CAPTURING AND DISPLAYING THE IMAGE

 
OV7725 interfacing
OV7725 video camera (see Figure 0.1) is used, which is CMOS technology and can capture up to 60 frames per second, the main features are listed below:

  • Operating Voltage 3.3V
  • Active consumption 120mA
Figure 1.1 - OV7725 Camera module

Figure 0.1 – OV7725 Camera module

This camera has multiple output formats, the choice of format with the most information is RGB 565 and also this can be recognized directly by the LCD used in this project.

Figure 1.2 - Several formats RGB color, it is appreciated that the numbering corresponds to the number of bits used respectively.

Figure 0.2 – Several formats RGB color, it is appreciated that the numbering corresponds to the number of bits used respectively.

Thus the format RGB565 has 2 ^ (5 +6 +5) color combinations, reason why is also referred to as Format 65k (65536 possible colors), also nota that it fully uses 16bits so it is commonly called “16bit color”. 6 bits are used to define the color green, this is because the human eye is more sensitive to this color.

Figure 1.3 - Transmission timing of data for an entire row in the RGB format 565

Figure 0.3 – Transmission timing of data for an entire row in the RGB format 565

As illustrated in Figure 0.3 this format send 2 bytes of information per pixel, so that the information is valid HREF must be high, this indicates the beginning of a row of the image, then a byte is read in D[9:2] whenever PCLK happens to be one logical. In the second PLCK cycle the second byte is read by completing the information of a single pixel. It is important to indicate that PCLK is always active even when HREF is low, but do not take data in this condition.

If the output format were YCbCr 4:2:2 the microprocessor would have to perform a conversion to RGB565 for each pixel to display on the LCD screen which means an undesirable increase in processing time.

Pixels are arranged in rows and columns, as a rectangular array. The number of columns rows multiplied by the number of rows gives results in the resolution of the image
The highest possible resolution for this camera is known as VGA (640×480), however the screen used is 320×240 which is why the output resolution of the camera is QVGA (320×240), this gives results 76800 pixels per frame; an image of lower resolution would imply a smaller area for the detection of the object, as well as a lower surface image shown on the screen.

Figure 1.4 - Transmission timing data for a complete picture the QVGA format (320x240), where tp = 2 x tpCLK

Figure 0.4 – Transmission timing data for a complete picture the QVGA format (320×240), where tp = 2 x tpCLK

In Figure 0.4 the timing used in QVGA format to send a full frame; a change to one logical VSYNC signals the start of a new frame, then transition to logic zero occurs, the first line is transmitted when HREF is high, once transmitted the 320 pixels corresponding to one line HREF goes to zero and then returns to its high state for the start of the next line. The process is repeated until the transmission lines 240 comprise a frame of QVGA format.

While there is a clock signal XCLK the internal signal PCLK oscillates so the sending of data bytes is continuous, that’s why we employ an external AL422B FIFO memory which has 393216 bytes that collects data using the synchronization signals and waits for the microcontroller to accesses each byte of data.

Figure 1.5 - Diagram of communication between the camera and the microcontroller OV7725.

Figure 0.5 – Diagram of communication between the camera and the microcontroller OV7725.

The goal is that the Microcontroller access the data any time required, instead of constantly receiving interrupts from the camera.

LCD interfacing
For image display a TFT LCD screen 320×240 pixels the same which is controlled by an integrated SSD1289 with 172800 bytes of GDDRAM (Graphic Display Data RAM) circuit is employed
Integrated also communicates with the microcontroller (see Figure 0.5), it does so by 8080 parallel interface. It is also important to note that the screen is built with a resistive film type touch 4 lines connected to the integrated circuit XPT2046 allows you to locate the position of a single point of pressure on the screen and inform the microcontroller via SPI interface.

Figure 1.6 - Diagram of communication between the display and the microcontroller.

Figure 0.6 – Diagram of communication between the display and the microcontroller.

 

Figure 1.7 - SSD1289 pins employed for transmission of a pixel in the RGB565 format 16-bit interface

Figure 0.7 – SSD1289 pins employed for transmission of a pixel in the RGB565 format 16-bit interface

The microcontroller has a peripheral block called FSMC (Flexible Static Memory Controller) which allows you to communicate with external memories LCD in this case.
To do this you must first set in the STM32F1 : the type of memory to be read (SRAM , ROM, NOR Flash, PSRAM ) , data bus width (8 or 16 bits), the memory bank to be used , waiting times , activation of writing, reading, among other features.
Once done the microcontroller automatically read and write the external memory each time you enter a region or assigned memory bank is read.
The settings for this particular case are : bus width of 16 bits for SRAM memory ( memory bank from 0×60000000 to 0x6FFFFFFF ) , mode A and uses the microcontroller lines NOE , NWE , NE1 , A16 so that connect to / RD, / WR , / CS and D / A from SSD1289 respectively.
Pins A [25:0] (address) on the microcontroller have automatically the value of the memory address.

Figure 1.8 - Operation FSMC peripheral as settings in this particular case

Figure 0.8 – Operation FSMC peripheral as settings in this particular case



IMAGE PROCESSING

 
IMAGE SEGMENTATION
The segmentation in this case has the task of separating a region of interest in the image based on the color you choose. As each pixel is obtained from the camera, is compared with the threshold values, and the result is stored in the binary image. In the same pixels that are within the threshold will be represented as “logic one”, while those outside this threshold will be “logical zero”. . An example of the binary image being the color green searched:

Figure 2.1 - Tresholding an image by color segmentation

Figure 1.1 – Tresholding an image by color segmentation

The binary image occupies a space corresponding to an array of 2400 32-bit numbers where each bit is a pixel of the binary image microcontroller, reaching store 76800 pixels corresponding to one frame

Threshold in the RGB color space
On the Format RGB565 color data per pixel has the red component corresponding to take values between 0 and 32, green between 0 and 64, finally blue can vary between 0 and 32. The separation of the three color components is performed by the help of three variables of 8 bits
Color is only admissible if it is within these 3 ranges, as an example in the following figure there are two variants of orange color, and a threshold whose value is 32, each of the three components are within acceptable values.

Figure 2.2 - Different representation of orange

Figure 1.2 – Different representation of orange

Two variants of orange color, the values of the individual components are shown in the top and the permissible thresholds for red, green, and blue components is observed.

The selection of a threshold in the RGB color space presents a problem because they don’t separate the hue and illumination data. Therefore normalized RGB space is used.
AKA rg chromaticity space, is two-dimensional and does not contain information of light intensity. Instead of representing the intensity of each color component (RGB), the proportion of each (rgb) to the total light (I) is represented:

R + G + B = I

r=R/I,\ \ \ \ \ g=G/I,\ \ \ \ \ b=B/I

Figure 2.3 - Color spectrum

Figure 1.3 – Color spectrum

Figure 2.4 - Three different color pixels are chosen in an orange sphere to analyze their properties.

Figure 1.4 – Three different color pixels are chosen in an orange sphere to analyze their properties.

Clearly the red component has changed in nearly half of its value, so selecting a threshold in RGB space where it is possible values for red (R) is impractical due to the large magnitude of the variation. However, while the value of lighting changes, the components keep the same ratio of r, g, and b.

Figure 2.5 (a) An orange sphere is illuminated with 2 side LED lights whose intensity is controlled by pulse-width modulation (PWM), 4 experiments with decreasing light levels was performed (for both types of segmentation the color to be detected taken during the highest level of enlightenment) (b) the resulting segmentation using RGB565 color space (c) the resulting segmentation using normalized RGB color space.

Figure 1.5

(a) An orange sphere is illuminated with 2 side LED lights whose intensity is controlled by pulse-width modulation (PWM), 4 experiments with decreasing light levels was performed (for both types of segmentation the color to be detected taken during the highest level of enlightenment)

(b) the resulting segmentation using RGB565 color space

(c) the resulting segmentation using normalized RGB color space.

DESCRIPTION OF THE REGION OF INTEREST
To analyze the data from the image segmentation algorithm on the binary image is used. For this it is assumed that the image consists of ones and zeros, where the pixels of color are of interest for logic one.
The algorithm developed is to demarcate the region of interest by marking the outline of the present group of contiguous pixels in the binary image, once this is done the result is delivered as upper limits, lower, right side, left side , and both horizontal and vertical location of the center of the object.

Figure 3.1 - Region of interest detection

Figure 1.6 – Region of interest detection

To achieve this we inspect each of the rows of the binary image from left to right, starting from top to bottom . If the grouping of contiguous 1L pixels exceeds a preset width then starts to go through the outline from the first pixel of the line.

Figure 3.2 - Contour detection algorithm

Figure 1.7 – Contour detection algorithm

The Figure shows how the algorithm runs on a sector of dark pixels detected , the process begins once the initial line (in this case the initial line is 1 pixel in width is detected, and the start is shown in red ) is from here that the contour path begins , as a rule the algorithm begins to search for the next valid counter-clockwise pixel in a 3×3 matrix , starting from an inspection after last detected pixel step. To continue the contour detection , the center of the next matrix is at pixel inspection detected earlier.

Whenever a contour pixel is detected its position is evaluated, if it is less than the minimum position, the minimum position value is updated , the same is done to the highest position. Thus at the end of the algorithm , the maximum and minimum positions vertically and horizontally is obtained. The contour inspection stops once the pixel corresponding to the initial position is detected.

Figure 3.3 - Different objects of similar color are recognized separately

Figure 1.8 – Different objects of similar color are recognized separately



PHYSICAL SETTING THE PLATFORM
The platform where the camera is located in order to perform the aiming is controlled by two servo motors, and Pan-Tilt type so it has two degrees of freedom. The control of each actuator is performed using pulse width modulation (PWM). For this, the available pins are used in the development board STM32 HY-Smart, there are 2 outputs PWM that can be used are the Timer2 (channel 2) and Timer3 (channel 1). These control the HS-785HB servo motor (located at the bottom of the platform) and the HS-645MG (located on top)

Figure 4.1 - Pan/tilt servo platform

Figure 1.9 – Pan/tilt servo platform

The two servomotors are controlled by pulses every 20 ms, the pulse width determines the position that they have, having a resolution of 16 bits then one should find a suitable configuration so that the PWM period has 20ms. The microcontroller can use a pre-scaler and the timer set the limit at which the count thereof is updated. Thus each pulse period is given by equation :

Periodo = \frac{2}{Fsys} * (1+TIM\_PRESC) * (1+TIM\_LIM)

Where Fsys is the system frequency (72MHz in this case), TIM_PRESC is the value of prescaler and TIM_LIM is the limit, both can take values between 0 and 65535. Under these settings we obtain the desired period by assigning a value of 11 for the prescaler and 59999 to the limit.

DIGITAL IMPLEMENTATION OF A PI CONTROLLER
The control algorithm used for both servos is a proportional-integral (PI for short) type control. First you must determine the effects that control actions on the output variable for the approximate model of the plant.
The control action is the pulse time in servo motors, the variable measured is the detected position of the object on the screen. You must set the ratio between the length of pulse sent to the servo motors and the change of the detected position on the screen, we do this by measuring how it has changed the value of the timer when the object on the screen has shifted a certain number of pixels, this value is the gain of the system (it is important to note that for both servo gain is not necessarily the same).
Here is the equation of a digital PI controller that I developed:

U_{[k]} = U_{[k-1]} + E_{[k-1]} * (Ki* \frac{T}{2} - Kp) + E_{[k]} * (Ki * \frac{T}{2} + Kp)

Where U[k] is the value of the control action (value servomotor PWM pulse) at that instant of time, the controller updates its value every time a frame is recognized, this happens every 115mS and becomes equivalent to sampling time T, Kp and Ki are determined experimentally.


LOCATION TESTS

 

Figure 5.1 - Initial location test

Figure 2.1 – Initial location test

The orange triangle moves in a circular clockwise (like a clock). The outline of the object detection and tracking of the system is appreciated.

Figure 5.2 - One revolution each 1108ms

Figure 2.2 – One revolution each 1108ms

 

Figure 5.3 - One revolution each 3820ms

Figure 2.3 – One revolution each 3820ms

 

Figure 5.4 - One revolution each 21170ms

Figure 2.4 – One revolution each 21170ms


CONTROLLER TESTS

 
An step input was recreated for both axes of motion, for this an object of color is located in one corner of the range of vision of the camera and then tracking activated so the servos move the camera toward the center.

Figure a) system response to target an object initially located in the upper left corner b) The same test.

Figure 6.1 - Tracking/controller test 1

Figure 3.1 – Tracking/controller test 1

Figure a) system response to target an object initially located in the lower rigth corner b) The same test.

Figure 6.2 - Tracking/controller test 2

Figure 3.2 – Tracking/controller test 2



Tests on the real time performance of the tracking system (Nov 2013) can be seen in the video below:

  1. mohammad
    February 10th, 2014 at 02:19 | #1

    hi,
    great project. i would like to make this project, hope help me.
    thanks

  2. Diego
    February 13th, 2014 at 15:09 | #2

    Thanks, tell us which hardware are you going to use.

  3. Gary Olson
    February 14th, 2014 at 08:45 | #3

    Hello,
    Nice project.
    The newer ST STM32F429I DISCO with lcd would be an interesting board to use with this project. It would also be a great development platform due to it’s low price and included lcd.

    Thanks for some great tutorials I have watched on Youtube, young man. Looking forward to your future posts.
    Gary Olson

  4. Diego
    February 18th, 2014 at 17:16 | #4

    Hi, Of course FPU and a faster clock will increase the performance to a whole new level!! in the future i would try to implement the system in the F4 line. Thanks

  5. February 21st, 2014 at 17:23 | #5

    hi diego, can you please provide the clean version of the code?

  6. Diego
    February 23rd, 2014 at 04:14 | #6

    @deeriee
    Hi Deeriee, that is actually the “cleanest” version of the code, i’m making improvements, and adding an user interface to configure parameters, so the code is in fact more messy right now… But the UI is looking good, and the performance is a little bit better… I know the code provided is not clean at all, but believe me, it is efficient.. i you have any question in particular just ask and i’ll explain.

  7. Lin_811
    June 5th, 2014 at 14:54 | #7

    hi,
    i want use it in ST STM32F429I DISCO.Hope to be able to open.

  1. No trackbacks yet.