Welcome to the first machine learning tutorial. We will cover image recognition techniques and the first image recognition code today. Friendly recommendation, we will explain the basics of image recognition, mostly using built-in functions. However, you can check out mathematical details here, in our other blog.
Let’s get started by learning a bit about the topic itself. Image recognition is, at its heart, image classification so we will use these terms interchangeably throughout this course. We see images or real-world items and we classify them into one (or more) of many, many possible categories. The categories used are entirely up to use to decide. For example, we could divide all animals into mammals, birds, fish, reptiles, amphibians, or arthropods. Alternatively, we could divide animals into carnivores, herbivores, or omnivores. Perhaps we could also divide animals into how they move such as swimming, flying, burrowing, walking, or slithering. There are potentially endless sets of categories that we could use.
For starters, contrary to popular belief, machines do not have infinite knowledge of what everything they see is. So, let’s say we’re building some kind of program that takes images or scans its surroundings. Well, it’s going to take in all that information, and it may store it and analyze it, but it doesn’t necessarily know what everything it sees it. It might not necessarily be able to pick out every object.
For this tutorial series, we will gently built a structure of understanding machine learning in image processing. Therefore, everything might seem difficult at first but you don’t learn if you don’t roll yourself to the wind.
Keep continue. Machines only have knowledge of the categories that we have programmed into them and taught them to recognize. And, actually, this goes beyond just image recognition, machines, as of right now at least, can only do what they’re programmed to do. So this means, if we’re teaching a machine learning image recognition model, to recognize one of 10 categories, it’s never going to recognize anything else, outside of those 10 categories.
Before performing any task related to images, it is almost always necessary to first process the images to make them more suitable as input data. In this article I will focus on image processing, specifically how we can convert images from JPEG or PNG files to usable data for our neural networks. Then, in other articles I will concentrate on the implementation of classic Convolutional Neural Network.
Before we do any image processing, we need to understand how image files work. Specifically, we’ll discuss how these files use byte data and pixels to represent images.
If you’ve ever looked at an image file’s properties before, it’ll show the dimensions of the image, i.e. the height and width of the image. The height and width are based on number of pixels. For example, if the dimensions of an image are 400×300 (width x height), then the total number of pixels in the image is 120000.
The function tensorflow.io.read_file takes the file name as its required argument and returns the contents of the file as a tensor with type tensorflow.string. When the input file is an image, the output of tensorflow.io.read_file will be the raw byte data of the image file. Although the raw byte output represents the image’s pixel data, it cannot be used directly. Let’s first see the implementation in Python using the soccer ball image.
import tensorflow values = tf.io.read_file('soccer_ball.jpg')
Now that have learn how to load an image, it is time to decode the image data into pixel data using TensorFlow.
The decoding function that we use depends on the format of the image. For generic decoding (i.e. decoding any image format), we use tensorflow.image.decode_image but if the input is a JPEG image we use tensorflow.image.decode_jpeg.
Since tensorflow.image.decode_image can decode any type of image, you might be wondering why we even bother with the other two decoding functions. One reason is that you may want to only use specific image formats, in which case it’s more efficient and better for code clarity to just use the format-specific decoding function.
Another reason is that tensorflow.image.decode_image supports GIF decoding, which results in an output shape of (num_frames, height, width, channels. Since the function can return data with different shapes, we can’t use tensorflow.image.decode_image when we also need to resize the image with tensorflow.image.resize_images.
We can change the pixel format of the decoded image via the channels keyword argument. The channels argument represents the number of integers per pixel. The default value for channels is 0, which means the decoding function uses the interpretation specified from the raw data. Setting channels to 1 specifies a grayscale image, while setting channels to 3 specifies an RGB image. For PNG images we’re also allowed to set channels to 4, corresponding to RGBA images. Setting channels to 2 is invalid.
Sometimes, we need to resize the image as for data augmentation. The function we use for resizing pixel data is tensorflow.image.resize_images. It takes in two required arguments: the original image’s decoded data and the new size of the image, which is a tuple/list of two integers representing new_height and new_width, in that order.
def decode_image(filename, image_type, resize_shape, channels): value = tensorflow.io.read_file(filename) if image_type == 'png': decoded_image = tensorflow.image.decode_png(value, channels=channels) elif image_type == 'jpeg': decoded_image = tensorflow.image.decode_jpeg(value, channels=channels) else: decoded_image = tensorflow.image.decode_image(value, channels=channels) if resize_shape is not None and image_type in ['png', 'jpeg']: decoded_image = tf.image.resize(decoded_image, resize_shape) return decoded_image
Normally when we do image related tasks we’re dealing with a large amount of image data. In this case, it’s best to use a TensorFlow dataset, i.e. tensorflow.data.Dataset, to store all the images. We can create a dataset using the from_tensor_slices function.
The Dataset class makes it easier and more efficient to perform tasks with all the image files. After we create a dataset with the image files, we will need to decode each file’s contents into usable pixel data. Since the decode_image function works for single image files, we will need to use the dataset object’s map function to apply decode_image to each image file in our dataset.
The output of the map function is a new dataset with each element now converted from the original image file to its corresponding pixel data. We use map rather than using a for loop to manually convert each image file because map does the image decoding in parallel across the files, making it a more efficient solution.
import tensorflow as tfdef get_dataset(image_paths, image_type, resize_shape, channels): filename_tensor = tf.constant(image_paths) dataset = tf.data.Dataset.from_tensor_slices(filename_tensor) def _map_fn(filename): decode_images = decode_image(filename, image_type, resize_shape, channels=channels) return decode_images map_dataset = dataset.map(_map_fn) # we use the map method: allow to apply the function _map_fn to all the # elements of dataset return map_dataset
The way we can extract the decoded image data from our Dataset is through a tensorflow.data.Iterator. We use the get_next function to obtain a next-element tensor, which is used for data extraction.
def get_image_data(image_paths, image_type, resize_shape, channels): dataset = get_dataset(image_paths, image_type, resize_shape, channels) iterator = tf.compat.v1.data.make_one_shot_iterator(dataset) next_image = iterator.get_next() return next_image
Who said deep learning models required hours or days to train. My aim here was to showcase that you can come up with a pretty decent deep learning model in double-quick time. You should pick up similar challenges and try to code them from your end as well. There’s nothing like learning by doing!
Did you find this article helpful? Do share your valuable feedback in the comments section below. Feel free to share your complete code notebooks as well which will be helpful to our community members.