3. NN Introduced
Here is a question that I’d like us all to keep in mind as we go through the contents of this video. Are neural networks a type of representation learning system? That’s the question to keep it in mind and let’s revisit it at the end of this video. Let’s now turn our attention to understanding deep learning, which is fast becoming one of the hottest buzzwords out there today. Let’s pick up from from where we left off. We spoke about how traditional ML based systems still require human experts to tell those systems what features of the data to focus on. And as the ML based system that we are trying to build becomes more and more complex, the role of those feature selection experts becomes more and more important.
Representation learning systems, on the other hand, go one step further in this. The feature selection step is now carried out by an algorithm rather than by a set of human experts. So representation machine learning based systems, in a sense, figure out by themselves what features in the data need to be paid attention to. This is extremely important as the formats or the feature vectors become harder and harder for human experts to work with. It’s relatively easy for a human expert to tell a machine learning based classifier to focus on how a mammal or an animal breeds or how it gives birth. And it’s only easy because these are attributes which experts have spent decades studying.
But this shifts the burden from the expert to the classifier. Because now for the classifier to do its thing, you actually need a live animal. You need an animal where you know how that animal breeds and how it gives birth. Let’s say that you only have images or videos of veils or of animals in general, then a representation based ML system is your only bet. The system is going to have to be smart enough to parse those representations of pictures or videos of different animals and use them to decide what features are important and then on that basis come up with the appropriate label.
So representation based learning systems try to figure out by themselves what features are important and the most common and important and currently popular type of representation learning happens through something known as deep learning. And deep learning itself is a slightly generic term which mostly has come to refer to one specific type of ML system and that is a neural network. So again, deep learning systems are those in which algorithms will learn on their own what features really matter. Neural networks are a specific type of deep learning algorithm, the most common and the most popular. TensorFlow is quite versatile. It can do a bunch of stuff.
But really what’s making TensorFlow the hottest tool in town is its facility with deep learning and neural networks. And neural networks in turn, are composed of units, simple building blocks which are called neurons. We’ll get to the details of what exactly a neuron does and how neural networks learn. But here now is a high level a schematic representation of how a deep learning based binary classifier would work. This would be a complex system which would take in a corpus of images and feed these into different layers. One of the layers would deal with, say, pixels.
The layer after that would abstract those pixels or group them and work with edges. Then edges and other features of the image would in turn be passed to yet another layer which may be recognized as corners and so on and so forth until finally we would get a final output layer which deals with object parts that would then classify on the basis of all of this, that image as being that of either a fish or a mammal. And that’s how a neural network would function as an ML based classifier. Each of these layers consists of building blocks called neurons which we will get to in a moment. The layers that the programmer deals with the input layer which takes in the pixels and the output layer which will actually classify the images these are known as visible layers.
The other layers, the ones which lie in between are hidden. And really you as the designer of a neural network have very little idea what’s going on in the neural network as a whole. This is pretty much a black box and this can be really disconcerting even to experienced folks in the field of machine learning because in deep learning networks unlike in other learning mechanisms the black box, the lack of transparency of the neural network really means that you’ve just got to trust the model. You don’t understand why the model is making a certain prediction. You’ve just got to hope that it knows what it’s doing. This was just a schematic example.
In general, a neural network consists of layers. Each layer is a set of similar neurons neurons which are similar in their structure. More on that in a moment. The term deep learning actually refers to the fact that this neural network has a bunch of layers arranged in depth. The layers in a neural network are what give the term deep learning its name. As we’ve said on multiple occasions the individual layers consist of neurons and there are complex interconnections between the neurons in each of these layers. In fact, it is possible that we end up with too many interconnections we end up fitting our training data too well and we end up having to drop some of the connections. This is known as dropout.
I give this as an example of the process and of the challenge is in designing neural networks. The challenge here is in creating a good map, a good architecture with all of these layers making the right choices about the type of neural network the interconnections between the neurons in those different layers and so on. For instance, for image processing, you would typically use a convolutional neural network. For text processing, you would often use recurrent neural networks. It is actually decisions like these the type of neural network, the manner in which you will represent pixels, the manner in which you will set up layers to process groups of pixels.
The challenges in creating deep learning applications lie in making such choices and architecting and tweaking such choices. And the reason for the popularity of TensorFlow is that such choices are actually very easy. They are abstracted away for you to a very large extent. In any case, coming back to our introduction to neural networks, these are a type of representation learning system. These are a type of deep learning system. The most popular type out there, in fact. Neural networks consist of layers and each layer consists of individual interconnected neurons. Neural networks really come into their own when the data set is so large and so complex that not even human experts have a successful track record of finding patterns in them.
Let’s return to the question we posed at the start of this video. Neural networks are indeed a type of representation learning system. Or to put it a little more precisely, neural networks can be used to build representation learning systems because neural networks can pick up on the features in the data that really matter. If we set up enough neurons with enough complex interconnections the neural networks will figure out for themselves what features really count. And that is the defining characteristic of a representation learning system.
4. Introducing TF
Here is a question that I’d like you to keep in mind as you go through the contents of this video. Why is it that TensorFlow is so popular for neural networks? What is it about TensorFlow structure that lends itself so naturally to the construction of complex neural networks? Let’s now turn our attention to TensorFlow, which is a specific library, a specific technology, which is incredibly popular and powerful for machine learning these days. Let’s be really clear about what TensorFlow is. It is an open source software library for numerical computation using data flow graphs. This is something that we get from the TensorFlow website. Different bits in this definition are worth paying attention to.
The first is that it is open source so that anyone can use it. This was very graciously open sourced by Google. The second is that this is actually a generic numerical computation library. So there’s nothing which says that you can only use it for machine learning or neural network applications. And the third is that it models these computations as data flow graphs. This idea should not come as a very new one. If you’ve used other bits of software like Apache Flink or data flow on the Google cloud platform, you’d see that the idea of representing operations on data in the form of a directed asylic graph is actually a pretty old and well established one. We’ll get back to the computation graph in a fair bit of detail.
Let’s keep moving. Let’s now consider some of the formidable advantages of TensorFlow. These are all reasons why TensorFlow has become so popular. For one, it’s distributed. And as we’ve already seen, the ability to run a complex computation on a cluster of machines on multiple CPUs or GPUs is increasingly important as data sets get too big to fit in memory on a single machine. A second important advantage of TensorFlow is that it lies at the heart of an entire suite of software tools. In addition to a bunch of powerful libraries available within TensorFlow, there is also Tensor board, which is a visualization tool which you can access via the browser. Another associated technology is TensorFlow serving.
This is a way of deploying trained machine learning models into production. It’s quite sophisticated. For instance, there are ways for clients to automatically pick up updates in the trained TensorFlow models. These are formidable advantages. Let’s also try and place TensorFlow in the context of other tools in machine learning. Let’s try and understand its users, its advantages or strengths relative to some other technologies, and also some of the challenges that you ought to expect if you decide to work with it. Let’s start with the common users. By far the most important use of TensorFlow is in the research and development of new machine learning algorithms and applications.
Implementing a neural network architecting it making all of the design choices. This is a fair bit of configuration work, and TensorFlow really does its best to abstract the hard parts away from you. So, again, the number one use case for TensorFlow is in researching and developing new neural network based applications. TensorFlow also has proper support for taking models from training to production. There is distributed training, and there is also TensorFlow serving, which helps with sophisticated deployment mechanisms that we just discussed very briefly. TensorFlow works really well for large scale distributed models and in applications like mobile and embedded systems as well.
If you are surprised that TensorFlow is able to play at both ends of the field, as it were, at both ends of the spectrum of complexity, it’s because Google constantly releases different versions of TensorFlow. For instance, there are lightweight versions available for use for mobile apps. It’s possible to pretrain a model on a very large amount of data and then make that model available for use where other users can just tweak maybe one layer in the neural network and use it to their own ends, even on a lightweight device. Let’s also quickly iterate through some of TensorFlow’s strengths. Maybe the most important of these is the ease of use through its Python API. It has an easy and stable Python API.
It runs well on both large and small systems. As we just discussed, it’s actually possible to train a model on a large set of hardware and then make it available for use with slight changes on a much smaller and simpler system. TensorFlow is very efficient. It affords great performance because a lot of its implementations have been tweaked and really optimized by Google. And it also has all of the other additional tools like TensorBoard for visualization and TensorFlow serving for deployment. There are obviously challenges that you ought to expect to face if you decide to go with TensorFlow. One of these has to do with the fact that distributed support for distributed TensorFlow is still rather nascent.
Another is that libraries are being developed at a frantic pace, so libraries get deprecated or replaced or subsumed every six months or a year. But maybe the most significant challenge of working with TensorFlow is the fact that writing custom models, building custom neural networks, is not straightforward. If you’d like to use one of the inbuilt or cookie cutter models that are made available for your use, that’s very simple, that’s very straightforward. But defining a neural network and wiring up all of the neurons by hand is challenging. This is inherently challenging, and there is only so much that TensorFlow can do to simplify or abstract this process for you. This is also a great place to talk about.
The central abstraction in TensorFlow computations in a neural network are modeled using a directed acyclic graph. This is an abstraction which is used in other technologies as well. Flink jumps to mind, as does Dataproc. So we have a network. This is a graph with nodes which represent computations. The edges between those nodes are data items, and those data items are called tensors. That’s where the name TensorFlow comes from. We’ll have a lot more to say on the computation graph and on tensors in just a little bit. Let’s return to the question we posed at the start of this video. The reason that TensorFlow is so popular these days is because it’s become the tool of choice for building complicated neural networks.
And the reason for that is TensorFlow and neural networks both rely on the same underlying abstraction, which is interconnected nodes which represent operations to be carried out on data items and edges between those nodes which represent those data items. The TensorFlow computation graph, or directed asylic graph, exactly mirrors the setup of a neural network. Remember that neural networks consist of neurons which are basic operations, and these neurons are wired up in complex ways. TensorFlow easily allows one to create nodes which represent present individual neurons and then interconnect those nodes or neurons by passing data between them. Those data items are the tensors. And that’s also where the name TensorFlow comes from.