17. Lab: Image Transformations
This lecture, we’ll learn what exactly the TF stack method does when it’s applied to a list of tensors. The last practical demo we read in a bunch of images and then resize them to all be of the same size. In this demo, we’ll perform some further transformations on images so that we get a hang of working with them all. The code for this demo is present in the file Image Transformations with Coordinator. Open up that file and follow along step by step. You’ll find in this demo that we’ve already seen a huge chunk of this code before. Most of this involves reading in all the files, using threads, using a coordinator to manage these threads, setting up these file names in a queue, and using queue runners which spin up the threads that the coordinator manages.
We’re still working on the two images that I’ve set up of my dogs. That is the original image list. Make sure you change it and update it to point to your images. The image names and the number of images that you have might be different. Set up a queue of the image names using String Input Producer in the TF train library. We use the TF whole file reader to read in images one entire file at a time, instantiate a session using a with statement and a lot of code. Here is the same as the four as well. Notice at the very top we instantiate the coordinator, which coordinates the loading of image files. We set up Q runners, which spawn of multiple threads that is coordinated by this coordinator.
We then have a for loop, which iterates over the images that we have. We use the image reader to read them from the file name queue. We decode these images as JPEG files and then we resize them to be 224 by 224 length and width. These are RGB images, which means the shape of the image tensor is 224 by 224 by three, where the last three represents the number of channels in that image. One for R, one for G, and one for B values. Color images have three channels. Once we have these resized images, we can perform transformations on these image sensors. The flip up down transformation flips the image upside down. The image will be flipped along the horizontal axis.
Let’s perform another transformation here. We’ll call TF image Central Crop. This will choose some portion of the image specified by the fraction that you pass in here. The fraction is 0. 5, that is 50%. Central Crop will preserve the middle or central 50% of the image and crop out the rest. Let’s now call session run on this image and print out the shape of the resultant array that is stored in Image Underscore Array. Image Underscore Array is a NumPy array of height, width and three as the number of channels for that particular image. Calling PF dot stack on this NumPy array will convert it to a tensor.
If the original NumPy array had dimensions 224 by 224 by three, the tensor will have shaped 224 by 224 by three. Print out this image tensor so you can see what it looks like and append it to a list of images. Every element in this image list is an image tensor. At the end of the for loop, call coordinator requeststop and coordinator join threads to wait for all images to finish loading and finish processing. Now, if we want to convert this list of image tensors to one tensor, which represents all these images, we can use TF stack. The TF stack operation applied to a list of tensors where every tensor in the list has a rank of r, will convert it to a single tensor of rank r plus one. The list of images that we pass in each tensor in that list has a rank of three.
This will convert it to a single tensor of rank four. The images tensor is a rank four tensor where the first dimension is the index of an image and the remaining dimensions represent the images themselves. When we have a single tensor representing multiple images, writing out image summaries also becomes easier. You can apply the TF summary image operation on the tensor, which represents a list of images, and write out the summary to summary writer. Let’s execute this block of code and see what the individual steps look like. Notice that we have two tensors representing images. These are both of rank three. These are 112 x 112 by three by 112.
We started out with images that were 224 by 224, but the central crop operation, which preserved only the center 50% of the image, converted these to 112 by 112 images. And when you apply the TF stack to the list of images, you get one tensor with a rank of four. The first dimension, which has the number two, indicates that there are two images in this list. The remaining dimensions are the dimensions of each image 112 x 112 by three. It’s important to note that a single tensor can be used to represent multiple images only if all those images are of the same size. If the size or the dimensions of the individual images are different, they can’t be represented in a single tensor.
Let’s run TensorBoard and see what the computation graph looks like. Click on the link. Go to the graphs tab. And here is the very complicated graph of our processing. The more processing we add, the more TensorBoard has to do, which explains why the graph looks like this. The images tab should show you that a single list of images have been written out, and it contains both images of my dogs. They have been central cropped and flipped upside down. And coming back to TF stack, which we had referenced at the beginning of this lecture, if you feed TF stack a list of tensors where every tensor is of rank r. It will convert it to a single Tensor of rank r plus one. All the Tensors in the list will be represented by one Tensor.
18. Introducing MNIST
Here is a question that I’d like you to keep in mind as we go through the contents of this video. The question is, can the MNIST database be used for both supervised and unsupervised learning, or is it only useful for unsupervised learning? Hi, and welcome to this module where we’ll be using TensorFlow for a simple machine learning algorithm. We’ll use the K Nearest Neighbors algorithm to recognize handwritten digits which are present in the form of images. We’ll introduce the MNIST handwritten digit data set. It’s a great data set to get started with on machine learning and pattern recognition techniques.
We’ll then understand how the K Nearest Neighbors machine learning algorithm works before we apply it to this MNIST data set. And finally, we’ll implement the K Nearest Neighbors algorithm in TensorFlow and use it to identify handwritten digits from zero to nine, which are part of the MNIST data set. MNIST contains a large number of images where each image represents a handwritten digit. All these images are preprocessed and well formatted, which means it’s very easy for you to use in your machine learning algorithms. MNIST stands for Modified national Institute of Standards and Technology.
The MNIST data set is freely available at Yarn Lakoon site. You can see that it has a training set of 60,000 examples and a test set of 10,000 examples, all images with handwritten data. Every image has a label associated with it which indicates the digit that is represented by that image to zero through nine, you can see the Gzip files which contain the training set, the test set, and the corresponding labels right here on screen. We won’t be downloading these files from here, though. We’ll just leave it to a TensorFlow library that will do that for us. This is how the images in the MNIST library look.
Each digit is in grayscale, which means it has just a single channel. Every image has a standard size.Every image is of size 28 into 28, 28 pixels width and 28 pixels height, which gives us a total of 784 pixels to represent one image in the MNIST data set. You can imagine every image as being subdivided into a grid where different cells of the grid hold one pixel value. Every pixel holds a single value for the intensity of that pixel. Remember, this is a single channel image. Let’s see how the values for a particular image might be laid out. If you look at the numbers on screen, you can see that the portions where there are strokes of the image have higher intensity values.
The white spaces in the image have lower intensity values. The intensity values should give you an idea of what that digit is. So if you kind of look really hard, you can see a four right there in that grid representing that image. Each image in the MNIST data set has a label associated with it, which tells us what digit is represented by that image. So each of these images on screen will have the labels file 00:41. Using the mnest data set is a great way to start off with machine learning and pattern recognition techniques. It’s the equivalent of the Hello World program when you start off with programming languages. Let’s return to the question we posed at the start of this video.
This statement on screen now is false. The MNIs database can be used for both supervised and unsupervised learning techniques because, as we saw, the MNIST database consists of images of handwritten characters which have been labeled with the actual correct label. Anytime you have a data set or a corpus in which you have data that is feature vectors as well as the correct labels, that is a perfectly acceptable data set. For supervised learning techniques. You can use those to train your machine learning algorithm. So this statement is fault.
19. K-Nearest Neigbors
There is a question that I’d like us to think about as we go through the contents of this video. Only supervised learning systems have a training step. In contrast, unsupervised learning systems do not explicitly have a training process. True or false? In this clip, we’ll understand the Key Nearest Neighbors machine learning algorithm, which we are going to use to identify handwritten digits. Before that, we’ll talk about machine learning algorithms in general for a little bit of context. Machine learning algorithms can be divided into two broad categories. These are supervised learning algorithms. Here, the training data that you feed into your machine learning has labels associated with every element.
These labels are used to correct the algorithm, which is then fed back to get better prediction regression, which we saw earlier, is an example of a supervised machine learning algorithm. The other category is the unsupervised machine learning algorithm. The model has to be set up right, but the model is responsible for understanding the structure and patterns that are present in data. There is no training data set and no labels that are associated with the data. To correct the algorithm, I’ll take you through an overview of how supervised learning works. Let’s assume that the input variable into the algorithm is x and the output corresponding variable is y.
Y is the label that you have available for every element in your data set. The objective of a supervised learning algorithm is to find the mapping function such that y is equal to function of x. What is this function that generates y from x? The goal is to approximate the mapping function so well that when new data comes in, when you have new input data x, you can predict the output variables y for that data. Because we have pre labeled data available to us for our input of x, we have the corresponding labels y.
We’ll use this existing data set to correct our mapping function approximation and get better approximations as a result, in unsupervised learning, we have the input data that’s represented by x, but we have no corresponding output data available for this set. The goal of the algorithm is to model the underlying structure in this data, find patterns within it in order to learn more about it, and make predictions. The machine learning algorithm here is kind of left to its own. Devices with no supervision algorithms have to self discover the patterns and structure in the data. We’ve mentioned the importance of training data in supervised learning models.
Unsupervised learning does not have training data in the ML based classifier that we saw earlier. The corpus of data that you pass in to generate this ML based classifier is your training data. Training data is required in supervised learning techniques. The Key Nearest Neighbors is a supervised learning algorithm which uses this training data to predict values for the input data set. It will try to find what element in the training data is most similar to the current sample. The key Nearest Neighbors machine learning algorithm uses the entire training data set that you’ve made available to it as its model.
Every element in this data set is associated with a label. For example, the elements here will be labeled with what these images are, such as rocket buildings, pig, and so on. This training data set is what we’ll use to make predictions about any new data point which comes in. Let’s say a new image comes in and we want to predict what this is an image of. The Knees Neighbor’s algorithm will find out which element in the training data or which image in the training data this particular image is the closest to or the most similar to. It will try to find the nearest neighbor of this particular image.
How do you define similarity or the nearest neighbors? There are a whole number of ways in which you could do that. We’ll see an example of that in the next clip. But for now, imagine that we are comparing this image of a house with each of our images in the training data set. Is it like a building? Not really. Is it like the signal or pig? Not really. Provided your comparisons for similarity have been set up correctly, you’re likely to get the result that this house looks very much like a shop. Knees Neighbors will find that element in your training data set that is the most like the sample that you’re trying to evaluate. This is how the algorithm works logically.
The question now arises how do we calculate the neighbors of a sample? How do we say that this bit of data is close to this other bit of data? We do this using something called distance measures. Distance measures indicate how far away one data point is from another. There are a whole host of distance measures that you can use to calculate the distance between data. These distance measures don’t just apply to coordinate geometry, they can also extend to images, because images are nothing but matrices with numbers representing the pixel values. You might have heard of some of the most common distance measures, such as Euclidean distance, Hamming distance, Manhattan distance, et cetera.
Distance measures are an important part of the Knees Neighbors algorithm because that determines who your nearest neighbor is. We’ll cover distance measures in a little more detail in the next clip. But before we are done talking about Knee Nearest Neighbors, let’s visually understand this concept. Imagine a two dimensional plane and you have a whole number of points on this plane. Imagine that each point represents an image. This is your training data set. Now let’s say some test data comes in and you want to find the nearest neighbor. For this test data.
Into which of these clusters will this image fall? You’ll calculate the distance of this image from all its neighbors and find the nearest ones. If you find that the K nearest neighbors, in this case KS. Three are blue points. It’s safe to assume that this is a blue point as well. Let’s say we have another point in our test data. We calculate the distance of this point from all other points and find that the nearest points are all red. That case, we can conclude that this point is red as well. This, in essence, is what the key nearest neighbor’s algorithm attempts to do. Let’s now turn back to the question we posed at the start of this video.
This statement is in fact true. Only supervised learning systems have a training step. Recall that a training step is one in which we take a corpus of data. This is data for which we already have a correct label, and we then go ahead and feed this label into our program. The program then learns from the correct labels and the data. What relationship exists in unsupervised techniques? We do not have a corpus of correctly labeled instances. In contrast, the algorithm in unsupervised learning techniques is just going to try and infer or figure out the relationships or patterns in the data for itself. So there is not an explicit training step in unsupervised learning systems.