15. Images As Tensors
Here is a question that I’d like you to keep in mind as we go through the contents of this video. There are different color formats out there. RGB and grayscale are two very common ones. Another common choice is CMYK. That’s an acronym for cyan, magenta, yellow and Key. Key is black. The question is how many channels would you need in order to represent a CMYK image in tensor form? How many channels would you require? This has nothing to do with the number of pixels in the image. In order to be able to use TensorFlow for image recognition problems, it’s going to be important for us to represent images as tensors.
Let’s talk about how we can accomplish this by relying on the pixels which constitute most image formats. Pixels can be thought of as little rectangles which altogether comprise an image. Each pixel is going to hold some value based on the type of the image. If the image is a grayscale image, that value will be a shade of gray between zero and one. But if it’s a color image, it’s going to need some kind of color encodings. A common form of color encoding is RG and B. RGB values are typically used for color images. Here there are three numbers. Each number ranges from zero to 255. And why this particular range? Because 256 values can be represented using eight bits.
Each of these three eight bit numbers will represent how much red, green and blue there exists in that particular pixel. Red, green and blue are the primary colors. So by combining them in different proportions, you can get a very rich, a very large number of colors. Let’s say, for instance, that you wish to represent red as in pure red. Then your three numbers would be 255. That’s the value of red and zero each for green and blue. Likewise, to represent the pure color green, you would set the values of the red and the blue numbers to be zero and specify green is equal to 255. And in exactly the same fashion, to represent pure deep blue, you would have red and green both set to zero and blue equal to 255.
Notice that when we represent images in RGB format, we need three values to represent the color in each pixel, and this means that we have three channels. Let’s say we used an alternative color representation such as CMYK, which is an acronym for cyan, magenta, yellow and key. There we would need four values, and that would be termed a four channel representation of color. For grayscale images, this representation is a lot simpler because just one number suffices this is known as grayscale. Here we just have one channel and each number in that channel or each pixel only represents the intensity information.
This is a number from zero to one. You might find other representations which are scaled to be larger numbers, but this always represents the intensity of either black or white. The numbers zero and one at the extreme ends of this range would represent black and white. Any number in between, say 0. 5, would represent a particular shade of gray. Here, just one value is needed to describe the intensity. And so grayscale images only use one channel. TensorFlow allows both grayscale and color images. This is because we can represent an image either using single channel or multi channel tensors.
And indeed, it turns out that both color and grayscale images can be represented as three dimensional tensors. Let’s understand how two dimensions will correspond to the x and the y axes. These refer to the location of the pixel. So we need two dimensions. These will correspond to the x and y axis, whether the image is gray, scale, or color. So images can always be represented by a three dimensional matrix, no matter how many channels are required. Now, the number of channels is going to specify the number of elements in the third dimension. And this is going to inform the shape of the corresponding three dimensional tensor. Remember that the shape refers to the number of elements in each dimension of a tensor.
Let’s say, for instance, that we actually want to represent the grids which you see on screen now as tensors, because there are six rows and six columns. In each case, we would represent the first two dimensions of the three dimensional tensor as being six. This is the same value whether the image is gray, scale or color. The third element in the shape vector of these tensors would vary. If the image is a gray scale one, one number will suffice because there’s just one channel if the image is a color one. And if we are making use of the RGB representation, then the shape vector will have a three as its last element.
This is a pretty standard way to represent images in TensorFlow. Use a tensor in which the first two dimensions correspond to the number of pixels along the x and the y axis, respectively. And the third dimension is going to include one element per channel. If you are using grayscale, that’s just one element. So a single channel image. If it’s RGB, then you need three elements. If it’s CMYK, then you need four elements in the third dimension. Now, in reality, most applications in TensorFlow, which have to do with image processing will deal with lists of images. And this is simple enough to represent. We just add one dimension. This will specify the index of a given image in a big list.
And this is how we are able to represent lists of images as four dimensional tensors. This representation is one which we will use momentarily when we are implementing character recognition using the MNIST database. Here we will have a list of a large number of images, and that list of images will be represented as one four dimensional tensor. This is only possible if the images are all of the same size. By the way, this is an important bit to keep in mind. Now, if you wanted to represent a list of N images, each of which is six by six pixels. So 36 pixels in total, six horizontally and six vertically.
And if this image is a color scale one with RGB encoding of the colors, then this is the tensor that we would use. N six, comma six, comma three. This is the shape vector of the four dimensional tensor that we would use here. Let’s understand each dimension, starting with the last one. The last dimension is always the number of channels. This is three. In the case of an RGB color encoded image, notice that the channels will always be the lowest level of granularity and so will always be on the extreme right of the shape vector. Then come the two dimensions corresponding to the length and the breadth of the height and the width of each image in the list.
Assuming that our images are six by six pixels, we will need two numbers, six comma, six. So we’ve now accounted for three out of the four dimensions of the four dimensional tensor. All that’s left is the first dimension, and this has the value N. Because we are going to have N images, this represents a number of images. And this is why if we wanted to represent a list of N images where each image is six by six pixels and each pixel contains an RGB value, the shape vector of the corresponding four dimensional tensor is going to be N. Six comma six, comma three.
Let’s now return to the question that we posed at the start of the video. The number of channels required to represent any image in tensor form is equal to the number of quantities that are required to represent the color. In the case of a grayscale image, the color is represented with just one number. In the case of an RGB image, there are three numbers. Those are the codes for RG and B, respectively. In the case of a CMYK image, there are four numbers, and therefore we need four channels, one each for cyan, yellow, magenta, and key or black. So the correct answer is four channels are needed for a CMYK image.
16. Lab: Reading and Working with Images
At the end of this lecture, you should know why exactly we would choose to use a coordinator and a queue runner in a TensorFlow program. TensorFlow is widely used in image recognition machine learning algorithms. In this lecture, we’ll take a baby step towards working with images by reading in images using our TensorFlow program, resizing these images and then summarize using them in ensorboard. The code for this example is present in the file image read and resize with Coordinator before we look at the code though, let’s set up the images that we need to read into our program. Click on the folder option at the top in order to create a new folder.
We are going to create a new folder named Images and upload a bunch of images in there. These are the images that we’ll process in TensorFlow. Once we have the Images directory here, click into that directory, click on the Upload button and upload images from your local machine. I’m going to choose my two favorite images here images of my dogs, Oba and MoJ. Once you’ve loaded the JPEG images to the Images directory, you can switch over to the code and let’s start executing it. After we’ve imported the libraries that we need, set up an array which point to the images under the Images folder. Here I have just two images. Obar JPEG and MoJ JPEG.
Obviously, the images that you might have uploaded into the Images folder will be different. Make sure you update this code cell to reflect the names of your images. Pass this list of images to TF Train. String, input. Producer String Input Producer is a method which takes in the list and produces a queue. With the items in that list, we get file name underscore Q. We are going to spawn off multiple threads to read from this queue of images. We will use the PF whole file reader to read the image file. This will allow us to read the entire file in one go. And here we have a huge block of code to resize and further process these images. Start off by instantiating a session object using the width statement.
In order to read the images in an efficient way, we are going to use a coordinator. A coordinator is a TensorFlow class which allows you to manage and coordinate multiple threads. A coordinator makes thread related tasks very simple. A single method call on the coordinator will wait for all your threads to finish processing. Or you can simply stop processing on all threads with a single method call and so on. A queue runner is a TensorFlow abstraction that allows you to process elements in a queue in parallel. A queue runner will typically spawn off multiple threads to perform these actions in parallel. The TF pre in Start Queue Runners method will start off all the queue runners to start reading images from our image list.
In practice, queue runners are often used with coordinators. This combination helps handle a whole bunch of issues which arise when we work with queues using multiple threads. All of these are abstracted away from you as a developer. If you use a queue runner and a coordinator together, run a for loop for I in range length of original image list in order to read each of the image files into our program image reader read from the file name queue will read one image at a time. It will read the entire image. Because we are using the whole file reader, the return value of ImageReader read is a tuple. The first field of the tuple is one we can ignore.
That is the name of the file. The second is the actual contents of the file, which will store an image underscore file. Use TF image decode underscore JPEG to decode this image as a JPEG file. The result of this bit of code is a tensor representation of the image. Once we have this image in tensor format, we can use TF image resize images to resize it into a 224 by 224 image. Call image set shape on the resized image and specify that the shape of the image is 224 224, and the last dimension is three. Because this is a color image, the three channels represent the RGB values for each pixel. Once we have all the images that we read in into the standard size, let’s quickly print out the shape of the image so we know everything is okay.
Call print image array shape. Everything in TensorFlow is computed only when we call session run. So image array we get by calling session run on our image tensor. Once we have each image represented in three dimensions length, width, and the number of channels we call TF expand dims on each image tensor. PF expand dims is used to expand the dimensions of a tensor. In this case, it will convert a three dimensional image tensor to a 4D image tensor. It will add an extra dimension which will be the image index. Images when represented in four dimensions have an advantage in that a list of images can be represented in a single MSR. Every image in the list can be indexed by the first dimension.
Each image will be processed on a different thread, which is managed by the coordinator. Once we are out of the for loop, we can call quad request stop and quad join, and wait for all our threads to finish reading and processing the images. So far, we’ve only viewed the computation graph on TensorBoard. Let’s use TensorBoard to view the images that we process as well. We use the same summary writer TF summary file writer pass in the session graph. In addition, we can iterate through all the images in our image list. This is using a for loop and make summaries of our images. This we do by calling TF summary image. Add all these image summary events to our summary writer and close the writer.
When you execute the block of code that’s present in this code cell, you’ll get an output which prints the dimensions of the two images that we resized. Both are now 224 by 224, with three channels representing their RGB values for each pixel. Let’s take a look at what the computation graph and the images look like in TensorBoard. Start TensorBoard. Click on the link and go to the graphs tab. This is the computation graph for processing images. We processed two images on two different threads, which is why there exist two paths through this graph. The image summaries that we wrote out will be present in the images tab. Here are two images. Image zero is a photograph of my dog Oba, and image one is a photo of my dog, MoJ.
Make sure that you are a good citizen and that you shut down TensorBoard when you are done with it. You should now know that the coordinator class is a class that TensorFlow offers, which allows you to manage and work with multiple threads very easily. A queue runner is another abstraction that TensorFlow provides that allows you to work with multiple elements taken from a queue in parallel, using multiple threads. Coordinators and queue runners are often used together in TensorFlow. This combination eases the burden on the developer when dealing with subtle threading and queuing issues. When working on multiple queue elements with multiple threads.