3. Lab: The Vision, Translate, NLP and Speech API
Let’s say you’re building an app where your users can take pictures of sign boards in foreign countries and immediately get a translation. You’re building it on Google cloud platform. What do Google APIs you would use in order to make this happen? In addition to offering you TensorFlow, where you can build machine learning models from scratch, google also offers you a huge variety of trained models at your disposal. You can use these models by calling the corresponding APIs. We’ll see the vision translate natural language processing and speech APIs in this lecture here. You don’t have to build your own model. You can simply take advantage of the treasure trove of data that Google’s models have trained on and build your apps using them. Before you use these APIs, you need to enable them using the API Manager dashboard.
On the Dashboard, click on Enable API and then choose the APIs you want to enable one by one. We’ll start off with the Google Cloud Vision API. On the Vision API page, click on enable. Once you’ve enabled an API, you need to set up credentials to be able to access that API. Click on Credentials in the left navigation pane, and you can see Create Credentials right there. If you’re already familiar with what credentials you need, you can simply choose the right one. Or if you have a few questions, you can say, help me choose and figure out what credentials you need. This will take you to a web page which will ask you questions about the kind of API that you want to access. You can say you want to access the Cloud Vision API, for example.
The next question is, are you using Google app engine, Compute Engine, or both? I’m just going to say both because that’s safer. Remember, your Data Lab instance is on a Compute engine, so you’re actually running it on Compute engine. It turns out that for the Vision API and the other APIs that we are going to use in this lecture, such as Translate, NLP, and the Speech APIs, you don’t need any additional credentials to be able to access them. Our examples will use an API key though. So we are going to go ahead and create an API key to use. If you click on API key, it will generate one for you. You simply copy it and use it within your code. Now, it’s not really safe for me to display my API key to you.
I have disabled this API key though, so it’s fine. Go ahead and enable the APIs for speech, translate and NLP. You don’t need to set up new API keys though. We’ll use the same API key for all of them. We are going to use the Data Lab instance that we had set up earlier in the course, not the new Data lab that we created for the TensorFlow examples, the Data Lab instance we had created earlier used the Create underscore VM script in this directory. Training Data Analyst Data Lab Cloud Shell we mentioned them that creating a Data Lab instance in this way had a bunch of other setup. One important thing here is in that this Data Lab instance, we have access to the Training Data Analyst GitHub Repo I reconnect to the Data Lab using the Start Tunnel SF script in the Data Lab instance.
Move into this folder. Training data Analyst, CPB 100, lab four C. And within that ML APIs ipynb is our Python notebook. That’s the one that we’ll be using. Now, notice the very first line of code. In the first code cell, it sets up an API key used to access all the APIs. Replace this API key with your own API key, which you had set up earlier. Just go over to your API Manager dashboard, copy over the API key, and paste it into this code set. We are now ready to access these APIs. Before we get into the actual code, run a quick tip install on Google’s API Python Client so that you have the latest version. Before we continue, I’m going to go ahead and clear all cells, all the execution output, so that it’s easier to follow along.
Let’s start off by using the Translate API. In order to access any of these APIs using the Python Library, we need to import the build function from Google API Client discovery you access the individual services that Google offers using this build function pass in the name of the API that you want to access. In this case, it’s Translate pass in the version and pass in the developer API key. The next line sets up the text to translate. Here we want to translate three different sentences or pieces of text, access the service, the translations method within that and call list, and specify a whole range of arguments. The source text here is in the English language. Let’s say the target is the French language and specify the input text that you want to translate.
The result of the translation will be stored in outputs, iterate through outputs and print out the translations. And that’s it. It’s as easy as that. Now, the problem here is I don’t know any French, so I don’t know how good these translations are. So I’m going to change the destination language to one that I can read. I’m going to ask for translations in Hindi. I run this once again and get the results in Hindi. And I must say that the results are fairly decent, pretty good in some cases. This was fun. I’m going to play around with it a little more. And this very last one was a near perfect translation. Let’s now play with the vision API. In this particular example, this code lab reads from an image that is stored on cloud storage and performs OCR on this image to find what exactly the text is.
This Vision API can be used for a lot more than optical character recognition. We’ll see that in the next code lab. Here is the image that we are going to access and read text from. The access permissions for this image should be set such that it’s readable by all. Get access to the Vision service using the build function as before we pass in Vision as the API we want to access version v one and our developer key make a request to Vservice images annotate, which makes an Http request to access the Vision API. Within the JSON body of this Http request, specify the URL of the image you wanted to access and in the Features field, specify what exactly you want to detect within this image. We want text detection and we want three results.
The top three results for text detection on this image make the Http request and print the response. You’ll find that the responses you get when you detect anything on an image is huge. You get a whole host of information within this JSON result. You’ll need to access the specific fields where the result you’re looking for is stored. Here we’ve accessed the field for what language the text is in and the actual text. And here’s the text right here in the ZH language code for Chinese. Well, if I’m a foreign tourist in China, just getting the text of this sign board is not really going to help me. I still can’t read it. I now need to translate this text to a language that I understand.
For this we use the Translate API, which we’ve seen before. Specify this text as the input. Call the translation API. Specifying that the source is the foreign language code which you just got and the target is English. Once you get the response, simply print it out to screen and there you see it, the message in English. There still seems to be something lost in translation though, but at least you have some idea of what the sign is about. The third API that we’ll explore here is the one for sentiment analysis or natural language processing. There are a bunch of famous quotes there, and we’ll evaluate the sentiment for each of these quotes. First, we’ll set up to access the API service. Use the build function as the four.
Specify that it’s the language API we want v one, Beta one, and the developer key is API key. Here are the various quotes that we’re going to analyze, specified in the form of an array language service. Documents Analyze Sentiment will give us the sentiment for each of these quotes in the request body. Specify the kind of document you are passing in here we are just passing in plain text. We make this request for each individual quote using a for loop from the response, extract the two important properties of sentiment the polarity, whether it was positive or negative sentiment that was expressed, and the magnitude, how positive or negative was the sentiment and here are the results.
You’ll find that they are pretty interesting. The very first quote is a positive quote, and its magnitude is 0. 8. And when you read the actual quote, I think it makes sense. The result for the second quote is a little off. It says it’s a negative quote with a magnitude of zero. That means it can’t tell whether it’s positive or negative, and it might think it’s more negative. If anything, that’s not really true. It gets it a little wrong. The Sentiment Analysis let’s move on to speech recognition at this point. You’re familiar with how we instantiate the service to access any API. Do the same thing here. The service is speech. Make a request to the service. Speech sync, recognize method.
This request contains a request body which specifies what audio you actually want to perform speech recognition on. You want to identify what’s in this audio raw clip. Listen to the clip for yourself if you want to, but it basically says, how old is a Brooklyn Bridge? There’s someone asking this question here. The response from this API also has a score for how confident it is that its speech recognition was accurate. It’s 98. 7 in this case. So if you were building an app where you could take a photograph of an image and have the text translated for you, you would have to use the Vision API to detect the text. And then finally the Translate API to get it in a language language that the user can read.
4. Lab: The Vision API for Label and Landmark Detection
If we wanted to use the Vision API in order to identify famous places in the world such as the Eiffel Tower or the Taj Mahal, what feature type would be specified in our request to the Vision API? In this lecture, we’ll use the Vision API for image recognition faces as well as famous places. Before you use the Vision API, you need to ensure that it’s enabled. This you can do using the API Manager dashboard. Also ensure that you have an API key setup. An API key is what we will use to access the Vision API. In this lab, we will make Http requests to the Vision API, and we’ll do this from the Cloud Shell command line by using the Curl utility. Let’s choose some images in order to use it with our Vision API. First up, I want to get some images of chocolate cake. Here is a really nice specimen.
This is what I’m going to download and store in my cloud storage bucket. I’ve already done this earlier. You are of course free to choose any image that you find interesting. So there is the chocolate cake JPEG in my Loony US bucket. The parameters of the request that we make to the Vision API the request body. I’m going to go ahead and store in a JSON file. I’m going to call it Requestcake JSON, and this is what it looks like. Within the request, we specify the image that it has to work on. And in addition, the features that we are interested in is label detection. We want Google’s API to detect what kind of image this is. What does this image represent? Store the API key in an environment variable so you can use this variable to reference the key rather than specifying the entire key for every request.
In order to allow the Vision API to access this image that we’ve stored in our bucket, we need to update the permissions on it. This you can do using the GS Util ACL ch command. Set it to all users. Read. Here is a curl. Http request to our Vision API It’s a post request. The request body is present in the requestcake JSON file. And here is the URL to which we make this request. Vision Google Apis. com version one. Images annotate and specify your API key, and you’ll find that for this particular chocolate cake, the results are uncannily accurate. The first result with the highest score is chocolate cake with 95% confidence. The remaining results are also very close chocolate, desert and so on.
Let’s try this API with an image that’s a little harder to understand, that of a scone. A scone looks like many other deserts, especially in this photograph. I’ve downloaded this particular image and stored it to my cloud storage bucket called Blueberry Scone JPEG. Switch over to the Cloud Shell command line and in requestscone JSON, I’ve specified the request to the Vision API. The source is the JPEG file that we just saw. And the features that we are interested in is label detection. We want to know what this image represents. Ensure the API has permissions to access this particular image. Set it to all users can read run the same command as before? Just point to a new JSON file. The request scone JSON. And if you notice here, the results are less wow inducing.
It says, Baked goods that’s not as good as chocolate cake. This generic label is what the Vision API has the highest confidence in. If you scroll down the results, you’ll notice that with less confidence, it thinks that it could be something to do with blueberry, or it could be a scone. Let’s see how good the Vision API is at detecting faces and landmarks. Here are a bunch of people taking selfies in front of the Aufil Tower. I’ve downloaded this image and stored it in selfie JPEG in my cloud storage. Set up a JSON file for the request. Call it. Request faces landmarks, JSON. Specify the image the Vision API has to look at. But this time we are interested in different features of this image. We are interested in detecting faces and also detecting landmarks.
Make sure that the Vision API can read selfie JPEG. Set up the permissions correctly, and then make the curl request to the server. Make sure you pass in the right JSON file when you make this request. You’ll see that the JSON response when we want to detect faces and landmarks is very, very huge. There are a whole bunch of bounding polygons with various vertices which will point out the various interesting characteristics within this image. The breadth of information here is pretty mind blowing. Notice that for every facial feature, left eye, right eye, left of left eyebrow you’ll have some coordinates indicating where exactly it’s situated for every human in this photo. And finally, at the very end, you’ll get a summary of what this API has detected.
It will give you a bunch of information for what it has detected in each person’s face. Joy likelihood, very likely it’s a happy person. Sorrow likelihood very unlikely. The only person that seems really joyful in this image is the person outlined on screen. Here is the result for another of the individuals here. You can see that it’s not really clear what emotions that person is expressing. Joy likelihood is also unlikely. Sorrow likelihood is also unlikely. This could apply to the faces of either of these two individuals. If you examine the JSON data in some detail, you will find that there also exists coordinates which tell you which face this analysis is for. The Vision API wasn’t able to detect Eiffel Tower as a landmark in this particular image.
Maybe it was too fuzzy, maybe there wasn’t enough detail. It could be any of a number of reasons. Let’s try another image where the Eiffel Tower is depicted a little more clearly. I saved this image in my cloud storage. I set up a JSON file with the request body. This file is named IFLE underscore selfie JPEG and I want to do just landmark detection. I’m not interested in the faces here. Change the permissions on the image file so that it’s accessible and then make the Http request to the server using curl and recognizing this is very easy for Google’s API. There you see it description is Eiffel Tower. If you want the vision API to detect famous places, you would specify the feature to be landmark detection.