Sep 26, 2023

Introduction to OpenCV

In nutshell OpenCV is programming library used for computer-vision. Originally written in C++, OpenCV contains multiple bindins for most popular programming languages, like Java or Python. Within this post I will show You how to read, write, change color-scale and perform binarisation operation. For simplicity purpose I will use Python bindings for examples.

In Python we usually use so called virtual environment, to encapsulate project environment. There are many ways to create them, I decided to use pyenv. Let’s create new virtualenv, with name opencv

pyenv virtualenv opencv
pyenv local opencv

Please note that in second line I switched currently used environment into newly created one. What is important, your environment is active ONLY in the current tab. To make sure that you’re using right one, just type pyenv version (which prints its name). If you want to make this environment global, just type

pyenv global opencv

In such case it will be always your default env. Last but not least we want to install openCV library. Package is called opencv-python, also we’ll need matplotlib. We can simply use pip for installing those packages.

pip install opencv-python matplotlib
pip freeze

If you can see opencv-python (numpy is installed as peer dependency) and matplotlib as an output from pip freeze command, it means that you are on the right sight of the moon 🌝. Well done!

Now it’s a right time to meet hero of the episode - my dog called Bambo (2.5 years old pug 🐶). His picture will be used as testing material. By the way, a fun fact: standard image used for testing image processing algorithms is called Lena. Lena (swedish model) was depicted in Playboy magazine, in November 1972. In wiki article I’ve just linked You can find more information why actually this picture was chosen. Now let’s get back to the specifics. Consider following code:

import cv2
from matplotlib import pyplot as plt

def rgb_hist(img):
    color = ('b','g','r')
    for i,col in enumerate(color):
        histr = cv2.calcHist([img],[i],None,[256],[0,256])
        plt.plot(histr,color = col)
        plt.xlim([0,256])
    plt.show()

def display_img(name, img):
    cv2.imshow(name,img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

img = cv2.imread('IMG_3158.jpg')
display_img('bambo',img)
rgb_hist(img)

What we can see here ? I’ve created 2 functions. First, one for calculating histogram of each channel. Typical colourful image contains 3 channels: Red, Green and Blue. Each channel, in turn, consists of a matrix of pixels - smallest, uniform, indivisible fragments of the image. Those pixels undergo the phenomenon of synthesis. Usually pixel is written on 8-bits, it means its value varies from 0 to 255. An image is created through a process called additive color synthesis. As a consequence we can calculate number of all possible colors - all possible combination of resulting pixel:

255 _ 255 _ 255 = 16 581 375

Almost 17 millions colors! In this color space pixel with compositial (0, 0, 0) results in black color (edge of the wall), in other hand (255, 255, 255) white (center of the wall). And now explaining histogram is piece of cake. It’s nothing more like a plot, where on X axis we observe each possible pixel value from given channel (0-255), and on Y axis count of occurences. Second function is meant to display original image with its natural color scale (hints: imshow takes window name as its first argument and waitKey function awaits any key to be pressed to close the window). Below you can find images generated by the script.

Take a look onto histogram and image. Pug is very specific dog, cause he contains mostly 2 colors black and silver (or sliver-like color, sometimes it’s more like peachy). And this is exactly what is reflected in the histogram. First peak around (20, 20, 20) represents black parts of the dog, like flews or ears (remember that (0, 0, 0) is pure black ?). If you still don’t belive me please visit any RGB color picker on the internet, like this one and check how rgb(20, 20, 20) looks like. Another peak is around (155, 160, 170) - solid gray - fragment of the wall (upper right corner). Evertyhing between those two peaks is hair of the dog, The fur is not entirely uniform - it contains many shades, so there is no single peak, but rather a wide range of colors similar to silver. Now let’s transform color scale into grayscale - as a result we’ll get single channel with 8-bit depths pixels. In OpenCV transformation is done using following formula:

0.299 ∙ Red + 0.587 ∙ Green + 0.114 ∙ Blue

Why to change color scale you may ask? Well, there are many reasons for it. In nutshell I can say that speed is important. Less data to process (we are reducing data by two thirds) makes algorithms faster (especially when we talk about real-time operations). Also color is not something important when trying to identify image features like edges. Quite exhaustive explanation for this question you might find here.

img = cv2.imread('IMG_3158.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
display_img('bambo',gray)
histr = cv2.calcHist([gray],[0],None,[256],[0,256])
plt.plot(histr,color = 'black')
plt.xlim([0,256])
plt.show()
cv2.imwrite('./bambo_gray.jpg', gray)

As you can see conversion is pretty straightforward. Only thing to remember is that openCV keeps channels in reversed order (BGR), this is why as second argument we pass cv2.COLOR_BGR2GRAY. Calculating histogram is done in similar way as previous, except that we don’t need to iterate over channels, as we have only 1. In last line you can see how we can save new images in openCV. First argument is just a path to image, second image instance.

Quite similar results to RGB image, don’t you think ? As you see we practially don’t loose any image details during conversion. Even histogram looks quite similar to our RGB image. Now we can go step further and transform such an image into binary image (this operation is known as thresholding). Binary image contains pixels with value 0 or 1 (think about benefits - we have single channel with each pixel stored on 1 bit only 🤯). The main purpose of this operation is to separate foreground characters from the background. Classic approach to binarization is done in such way that we create two groups with minimal variance inside each group. This method has official name - Otsu’s method. Looking only on histogram we can guess that approximate threshold equals around 60. This is how we do it in opencCV.

img = cv2.imread('IMG_3158.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, tresh = cv2.threshold(gray, 60, 255, cv2.THRESH_BINARY)
ret_otsu, tresh_otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
print(ret_otsu)
display_img('bin', tresh)
display_img('bin_otsu',tresh_otsu)

I’ve binarized image twice - first with arbitrary value 60 (threshold is passed as second argument, third is max pixel value) and second time with treshold computed by algorithm. Script told me that true value is 84 (cv2.threshold function returns threshold as the first output - in case of using cv2.THRESH_OTSU function second argument might be discretionary). But as you see it’s not far away from value I bet only on looking on histogram. So we can conclude that sometimes observing histogram only might be enough to do the job. Below you can find both images, left - my own threshold 60 , right 84.

Do we see Bambo as main character on this image ? For sure! For the sake of precision, I want to mention that the global thresholding method (like Otsu) doesn’t always work in practice, especially when the image has uneven lighting. In such cases, adaptive thresholding with local thresholds should be used. In general it works like that we split image into smaller parts (so called windows) and find local threshold. After tresholidng in separation we join those parts together. I will try to write post about adaptive thresholding as well.

And at the end I want to explain why I’ve choosen picture of my pug as testing image. Final operation presented in this article was binarization. And if you look on a pug You might think that’s perfect example of binarization without even perfoming operation - from colors point of view, there are only two groups (black and silver/silver-like). This is also the reason why we did not have a problem to apporximate threshold based on histogram, without any computation.

I hope that You liked this content (and above image as well 😆) /hint: cartoon filter/. See You in the next post then ! (I’m not promising it will be about image processing as well, but for sure it’s not last from this series).