4. Scan vs Query API Calls
Let’s start with a query. What is the query? Well, basically, a query operation finds items in the database using only the primary key attribute values.
So you must provide a partition attribute name and a distinct value to search for. So, to give you an example, you might select this item where the ID is equal to two one two, and so that will bring up user number two one two, and it will bring up basically everything in that item. So for all the different attributes, it’s going to bring up their first name, their surname, et cetera. So you can also optionally provide a sort key attribute name and value and then use a comparison operator to refine the results. So this would be like if you were looking at a forum and you wanted to say, “Okay, bring up user two one two, and I want to know what they’ve done in the last seven days, so I want to see what they posted from this date to this date.” So that’s where you would do a query, and you’re using a sort key attribute as well as the primary key or the partition key attribute. And like I was saying before, by default, a query returns all of the data attributes for items with the specified primary keys.
So it’s going to give you every single attribute in that table: the first name, the last name, the email address, et cetera. However, you can use the projection expression parameters so that the query only returns some of the attributes rather than all of them. And so that’s a really important thing to remember going into the exam. If you don’t want to return all attributes within an item, you can use the projection expression parameter to stop it so that the query only returns the data that you want. You might just want the user’s email address, for example.
Okay, so when you run a query, the results are always going to be sorted by the sort key, if you have one. So if the data type of the sort key is a number, the results are returned in a numeric order, and they’re going to be in an ascending order. So it will be your lowest number first. So it would be user ID 1, followed by 2, then 3, and so on. Otherwise, if it’s a text, the results are going to be returned in order of the key character code values, and by default, the sort order, like I said, is always ascending. You can reverse the order, however. So there’s a thing called the scan index forward parameter, and by default it’s set to true. All you need to do is set the scan index forward parameter to false when you’re running this query, and then it will be in descending order, which is important to remember going into the exam. Remember to use the scan index forward parameter to change the order of the results from ascending to descending. And by default, again, queries are always going to be eventually consistent.
So if your application requires strongly consistent readings, you can change that in the query. And so, by default, queries are going to be eventually consistent, but you can definitely change that to make sure they’re strongly consistent. So remember, when you’re writing to DynamoDB, because it is spread across three different locations, you can get some replication-type lag, and if you need your rights to be strongly consistent, you can change that. Similarly, if you’re running a query and you need strong consistency in here, you can change that as well, so that’s a query. Queries are pretty straightforward. What is a scan? Well, basically, a scan is even easier. A scan operation examines every item in the table; it’s just basically dumping the table. So by default, a scan returns all of the data attributes for every item. So just like a query, however, you can use the projection expression parameter again so that the scan only returns some of the attributes rather than all of them.
So what should I use, the query or the scan? Well, I think it’s pretty obvious that in most cases you want to run a query operation because it’s more efficient than a scan operation. Remember, a scan operation is going to dump the entire table. So the scan operation always scans the entire table, then filters out values to provide the desired results, essentially adding the extra step of removing data from the results set. Avoid using a scan operation on a large table with a filter that removes many results if possible. Also, as a table grows, the scanning operation slows. The scan operation examines every item for the requested values and can use up the provisioning throughput for a large table in a single operation. So think about that. When you run a scan operation against your largest table, you can actually completely max out your provisioned throughput just by hitting the scan button. So for quicker response times, design your tables in a way that can use the query-get or batch-get item APIs instead.
Or alternatively, design your application to use scan operations in a way that minimises the impact on your table’s request rates. So what are my exam tips? Well, a query operation finds items in the table using only the primary key attribute values, and you must provide a key attribute name and a distinct value to search for. So just remember, if you’re running a query, you’re always going to need the key attribute name, so the ID, and then what value it is you’re searching for, so it might be the ID. One, two, three. It’s also important to remember that a scan operation is going to examine every item in the table. By default, a scan returns all of the data attributes for every item, but you can limit this using the projection expression parameter so that the scan only returns some of the attributes rather than all of them. Remember that query results are always sorted in ascending order by the sort key. If you have a sort key and want to reverse it, set the scan index forward parameter to false. And then finally, just remember to try and use a query operation over a scan operation because it’s far more efficient. You’re not getting rid of everything.
5. DynamoDB & Provisioned Throughput
So let’s go through the provisioning process. If you remember from the earlier lectures, DynamoDB allows us to set a read-provision throughput and a write-provision throughput. So let’s start with a read provision throughput unit. and it’s pretty simple. All reads are rounded up to increments of 4 KB in size. Now, if you remember, we had two different models. We eventually had consistent results, and that’s our default model. And that can consist of two reads per second. We then have strongly consistent reads, which consist of one read per second.
Let’s now look at our write provision throughput. And basically, it’s even simpler than reading provision throughput. We have all rights that are 1 KB in size and consist of one write per second. Okay, now I know what you’re thinking. This is starting to sound really complicated. Where’s Ryan going with this? I’m not sure I’m going to like this. The next thing I’m going to do is just show you the magic formula, and then we’re going to go through and practise a whole bunch of different examples. And I promise you, it becomes really, really simple once you’ve practiced. And this is the way I managed to get really good at math in high school and at university: I would read the theory once, and then I’d actually just go and do the questions and do them over and over again until it clicked. Okay, so let’s look at an example. And this is very much what you’re going to see on the exam. So it’s going to be similar to this. You have an application that requires you to read ten items. And remember, an item is just a row in your DynamoDB tables, and it needs to read these ten items at 1 kilobyte per second using eventual consistency.
What should you set the read throughput to? And the way you calculate this is basically to take the size of the read and round it to the nearest four kilobyte chunk, and then divide it by four. And then you multiply that by the number of items, and that’s going to give you your read throughput. And then you have to decide whether it’s eventually consistent or strongly consistent. If it’s eventually consistent, you need to divide that number by two. If it’s strongly consistent, you just leave it as it is. So this is for reading consistency only. Okay, so let’s move on to an example where we can actually have a look at this in action. So you have an application that requires you to read ten items at 1 kb/s using eventual consistency. What should you set the read throughput to? So the first thing we need to do is calculate how many read units per item we need. So we’re going to round that up to the nearest four-kilobyte increment. So it’s four, and then we’re going to divide it by four. So four divided by four means we need one read unit per item. Okay, and then how many items do we have? Well, we have ten items, so we multiply the number of read units per item by the number of items that we have, and it’s ten.
And then we look at the consistency model. So we’re using an eventually consistent model. So we basically take that ten and divide by two, and it gives us five units of read throughput. If we were using strongly consistent, we would simply divide by one. So ten divided by one would be ten. But in this case, we actually only need five units of read throughput. So let’s go through another example, and I promise it gets easier and easier as you go through it. So you have an application that requires you to use eventual consistency to read ten items of 6 KB each per second. What should you set the read throughput to? So the first thing we do is calculate how many read units per item we need. So we take the 6 and round it up to the nearest increment of 4 KB, which is 8 KB. We then take 8 and divide that by 4 KB. So eight divided by four equals two, and we need two read units per item. We then take the number of read units per item and multiply it by the number of items we actually have. So it’s ten read items, and that’s going to give us 20. And again, because of its eventual consistency, we divide by two, which equals ten.
So we’re going to need ten units of read throughput. Let’s do another example. And it gets easier, guys, honestly; as you start going through it, it becomes a pattern. So you have an application that requires you to read five items of 10 KB each per second using eventual consistency. What should you set the read throughput to? So again, we need to calculate the number of read units per item that we need. So we take our 10, we round up to the nearest four kilobyte increment, so rounded up to the nearest four kilobyte increment is going to be twelve, and then we divide it by four. So we need three reading units per item. Then how many items do we have? Well, we have five items, so three times five is 15, and we’re using eventual consistency. So we divide by two, and that’s going to give us 7.5. So, wait a minute, can we actually use 7.5 in the console? Let’s take a look. So here I am in the AWS console, and if I just go over to my product catalogue and I go over to my capacity and I want to change my read capacity units, let’s say I want to make it 7.5, it says no, it has to be an integer. So we always have to round it up.
So if we get a number like 7.5, you would round it up to eight. And so you can see here that we’ve got eight units of read throughput. Okay? So it’s pretty simple for read throughput. Let’s just do another one where we’re not using eventual consistency; we’re using strong consistency. So we have an application that requires you to read five items at 10 kb/s using strong consistency. What should you set the read throughput to? So first we need to calculate how many read units per item we need. So our file is 10 KB in size. We’re going to round that up to the nearest increment of four. So it’s going to be 12 KB. We divide that by four. So we’re going to need three read units per item. And then, because we need strong consistency rather than eventual consistency, we don’t divide by two; we just leave it as is. So it’s going to be 15 units of read throughput. Okay, so let’s look at write throughputs.
And write throughputs are much easier than read throughputs. And the reason for this is that you don’t have to worry about a read unit being 4 data. Fortunately, a write unit is only one data point. You don’t have to worry about rounding up to the nearest increment of four, and you don’t have strongly consistent rights or eventually consistent writes. It’s all just one model. So it’s really, really simple. Let’s look at a question. So we have an application that requires you to write five items. with each item being 10 KB in size per second. What should you set the right throughput at? Well, each write unit consists of one piece of data, so that’s really easy. So you need to write five items per second, and each item is 10 KB in size. So five times ten equals 50.
That means you’ll need 50 of the correct units. So your right throughput would be 50 units. Let’s look at a second example. So you have an application that requires you to write twelve items of 100 items each second. What should you set the right throughput at? Again, each write unit represents one data point. You’re going to need to write twelve items per second, with each item having 100 data points. 12 times 100 KB equals 1200 write units, or 1200 KB. So you need a write throughput of 1200 units. So write-through rates are really, really simple. You just multiply the two numbers out, and there’s the number of writing units that you need. With your read units, however, you basically have to round up to four, then divide by four, and then multiply that out by the number of items that you have. And then, if it’s going to be eventually consistent, you just divide by two. If it’s strongly consistent, you don’t divide by anything. So we’re going to have lots of practise quiz questions at the end of this section of the course, so you can go through and test yourself on that.
The last thing is, what happens if you exceed your write or read throughput? What happens if you’re hammering your DynamoDB table too hard? What are you going to see? You’re going to get a 400 HTTP status code error, and it’s going to have the following error: It will simply state “Provision throughput exceeded exception.” And that simply means that you’ve exceeded your maximum allowed provisioning throughput for a table or for one or more global secondary indexes. So keep in mind the provision throughput exceeded exception. And that can be an exam question as well. So you’re always going to get a 400 HTTP status code, and it’s going to say that the provisioning throughput exceeded the exception. So that is it for this lecture. Cloud Gurus I know it starts out complicated, but as you go through it, it becomes easier and easier. Just run through the steps, and you’ll be able to calculate provision read throughput and provision write throughput, and you’ll be able to pick up at least two or three questions in your exam. If you have any questions, please let me know. If not, feel free to move on to the next lecture. Thank you.
6. Other important aspects of DynamoDB
Exam. So let’s start with conditional rights. Now, if you remember, I told you that DynamoDB is basically spread across three separate facilities. Now, inevitably, this can cause some issues. The first of all, the first issue is that they’re going to potentially be reading the data from separate facilities, or they could be writing the data to separate facilities. So let’s have a look at an example. We’ve got two users, and they’re querying and sending getitem requests for an item within our table. It’s got an ID of 1, and its price is $10. So both users get that information back, and it’s the same information in this example. They then both want to update the item at exactly the same time. So one user wants to update the item to be $15, whereas the other user wants to update the item to be $6. Well, if we’re not careful, we can have basically two items with conflicting data.
So, essentially, one item would be updated to the same item, which would then be updated to $15, and then to $6. So basically, DynamoDB requests, if you remember from the other lectures, occur, updates occur, and they are presented to DynamoDB. So you can have this issue where you’ve got two users writing at the same time and updating the same item number, and they can be different values. And so the $15.01 would come first, and then the $6.01 would come second. So how can you prevent this? How can you just say, “I only want to update it if it’s a certain thing?” Well, that’s called a conditional write. Basically, we’ve got our two users again. They’ve got ID equals one, and this is our item, and the price is $10. They then both want to update the item, but only if the price is $10. So that’s what’s called a conditional write. Essentially, it’s an “if” then “then” statement. So we’re going to send an update for the item if the price equals $10.
So the first user will go through and update that, and they’ll update it to $15. The second user will send the exact same request. But because the item itself is no longer $10, it’s now $15. That update won’t occur. And this is what we call conditional rights. So conditional rights are what we call indemnpotent. And this means that you can send the same conditional write request multiple times, but it will have no further effect on that item after the first time DynamoDB performs the specified update. So, just like I said in my other example, suppose you issue a request to update the price of a book by, say, 10% with the expectation that the price is currently $20. However, before you get a response from the network, an error occurs, and you don’t know whether or not your request was successful or not. Because a conditional update is an independent operation, you can then send the same request again, and DynamoDB will only update the price if the current price is still $20. So that’s what we call indemnity. Okay, so let’s quickly talk about atomic counters.
So, DynamoDB supports atomic counters, and this is basically where you use the update item operation to increment or decrement the value of an existing attribute without interfering with the other write requests. It’s also worth noting that all write requests are processed in the order in which they were received. So let’s look at an example. You’ve got a web application that might want to maintain a counter per visitor to the site. So in this case, the application would need to increment this counter regardless of its current value. So it’s always going to add one, regardless of whether it’s a million or five or whatever; it’s going to update that counter by one for each visitor to the site. So it’s important to remember that atomic counter updates are not independent. This means that the counter will increment every time you call the update item.
So if you suspect that a previous request was unsuccessful, your application can try again with the update item operation. However, this would risk updating the counter twice. This might be acceptable for a website counter because you can tolerate slightly overcounting or undercounting the visitors. But if it were, let’s say, for a voting application or a banking application, and you needed the data to be 100% accurate, it would probably be safer to use a conditional update rather than an atomic counter. What might happen in the exam is that you’re going to get a scenario where it will ask you whether or not you should be using atomic counters or conditional updates. And really, what it comes down to is the data. Is it critical that the data has to be correct, or can you have some margins of error in there?
And if you can have a margin of error and you don’t need your updates to be resilient, then go ahead and use an atomic counter. But if you can’t have any margin of error and you need incremental updates, then make sure you use conditional write updates. And finally, we just move on to batch operations. We’ve kind of seen this a little bit through our policy documents when we’re saying what is allowed and what is not allowed with DynamoDB in some of the other labs. But basically, if your application needs to read multiple items, you can use the batch get item API. So a single batch get item request can retrieve up to one megabyte of data, and that can contain as many as 100 items. In addition, a single batch get request can retrieve items from multiple tables as well. So that’s really it for this lecture, guys. And in the next lecture, we’re going to summarise everything you’ve learned in this section of the course. So if you’ve got the time, please move on to the next lecture. Thank you.
7. DynamoDB Summary
Exam. So what is DynamoDB? Well, we learned it’s a fast, flexible, nosebleed database service for all applications that need consistent single-digit millisecond latency at any scale. It’s a fully managed database that works with both document and key value data models. Its flexible data model and reliable performance make it a great fit for mobile web gaming, ad tech, IoT, and many other applications. And a lot of big enterprises use DynamoDB DB.
Now, remember, it is fully managed. You can’t SSH into a DynamoDB database or anything like that. This is a service that’s presented to us by AWS. So, here are some quick facts about DynamoDB. It’s stored on SSD storage. It’s spread across three geographically distinct data centers. You’ve got two different types of reading models. So we have our eventually consistent readings, which is what we want. And this basically says that consistency across all copies of data is usually reached within 1 second. Repeating a read after a short time should return the updated data, giving you your best read performance. And then we have our strongly consistent reads and a strongly consistent read return, a result that reflects all rights that were successfully received prior to the read. So in DynamoDB, we learned the basics. Basically, it consists of three things.
So we’ve got our tables, and inside our tables we’ve got our items, and inside our items we’ve got different attributes. So here’s our student table, and here are our different items. We’ve got item one and item two. And then we’ve got our attributes: our unique ID, our first name, our surname, our phone number, our address, et cetera. We learned about primary keys. So there are two different types of primary keys available when you’re dealing with single attributes, like a unique ID, which is the most common. You’re going to use a partition key. Sometimes this can be referred to as a hash key, as it certainly was in 2015. So if you see “hash key” come up on the exam, they’re just talking about a partition key; they’ve basically changed the wording for it. And our partition key is composed of one attribute. We can then have composite keys. And so this is basically a mixture. So think of a unique ID and, to date, range. So this is where we’ve got our partition key and our sort key.
And then a sort key can sometimes be referred to as a range key. So you’ve got your hash and range keys. So it’s composed of two attributes. And typically the best example of this is when you’re running a forum and you’ve got a particular thread up. You might have the user ID, and then you might have an idea as to when the user was posting. So you might have a range of over a week, a year, a month, or whatever. So why does this matter? Well, basically, it determines how the data is stored in DynamoDB. So DynamoDB uses the partition key values as an input to an internal hash function. And basically, the output from this hash function determines the partition. And a partition is simply the physical location in which the data is stored.
Now, it’s important to remember that with a partition key, no two items stored in a table can have the same partition key value. If you’re going to have, say, two separate tables, So we’ve got our user table, and we want our user partition key to be a unique ID. We’ll never be able to have the same unique ID with multiple users. So we want that to remain the same. But if we’re looking at a forum, for example, and a user is posting to the forum, we’re going to use the same partition key many times in that forum because the user will want to post many messages to it. So if we’re doing that, then we need a petition key and a sort key.
So DynamoDB uses the petition key value as the input to an internal hash function again, and the output of this hash function determines the nction again, aThis is simply where the physical data is going to be located. But two items can have the same partition key, but they must have a different sort key in this scenario. So this is perfect for, say, a forum post, for example. And then when you do this, all items with the same partition key are going to be stored together, and they’re going to be ordered by the sort key value. So that’s how it’s different between a partition key as opposed to when you’ve got two attributes with a partition key and a sort key. Moving on to indexes We’ve got two different types of indices. It’s really important to remember these two.
So we have our local secondary index, and this has the same partition key but a different sort key. And this can only be created when creating a table; it cannot be removed or modified later. And then we have our global secondary index, which has a different partition key and a different sort key. And this can be created at table creation, or it can be added later, and you can delete a secondary index, as we saw in the labs. Moving on to DynamoDB streams Basically, this is used to capture any kind of modification to DynamoDB tables. So if a new item is added to the table, the stream captures an image of the entire item, including all of its attributes. If an item is updated, the stream captures the before and after images of any attributes that were modified in the item. And then, if an item is deleted from the table, the stream captures an image of the entire item before it was deleted. Okay, so let’s look at our examples.
So we’ve got Alan Brown. He’s decided to sign up for our website, and he’s created a new item with these four different attributes that are then stored in DynamoDB. DynamoDB streams basically log this, and the change to the data or the new data is stored for 24 hours. That can then trigger a lambda function, which would then potentially place this data in a separate DynamoDBtable that might exist in another region, as well as trigger an Ses function, which sends our user Thank you for joining our website by email. So that’s how we can use DynamoDB streams in multiple different ways. So let’s move on to a query versus a scan.
So, a query operation finds items in a table using only the primary key attribute values, and you’ve got to provide a partition key attribute name and a distinct value. So you have to say ID equals four, one, two, or something in order to run a query, whereas a scan operation examines every single item in the table, and by default, a scan returns all of the data attributes for every single item. However, you can use the projection expression parameters so that the scan only returns some of the attributes rather than all of them. And remember that it’s much more efficient to use a query operation over a scan operation, so that’s best practice. If you see an exam scenario, you always want to try and use a query, unless there’s a specific reason why you want to dump the entire table. We then looked at DynamoDB provisioning throughput calculations and how to do our calculations.
So I’ll just pick some random examples here. So this is to calculate the number of reads. So we have an application that requires you to read five items. Each item is 10 KB in size per second, and we need to use the eventual consistency model. So we take the ten, we round it up to the nearest increment of four, and we round it up to twelve. We then divide that by four, and we know we’re going to need three read units per item. So if we need three read units per item, when we’ve got five items, we’re going to need 15. And then, because we are using eventual consistency, we can divide by two. This will take us to 7.5, but as we learned, you can’t use decimal places; it has to be integers, so we round it up to eight. So it’s just really important to remember, especially when calculating read throughput. Remember that rule where you round things up to increments of twelve? Remember that rule where you round things up to increments of four? So in this case, we round up to twelve, we divide by four, and that gives us three read units per item. And we’ve got five items, so we’re going to need 15 in total.
And because we’re using eventual consistency, we divide by two. If we didn’t, and we were using strongly consistent, we’d simply divide by one, yielding 15. So remember how to do that? This is some seriously easy material that you can pick up in the exam. Going in the right direction was extremely simple if you remembered how to do it. You just take the two numbers and multiply them out. So in this case, we’ve got an application that requires us to write twelve items. Each item is 100 KB in size, and we have to do this every second. So what do we set the right throughput to? Well, it’s twelve times 100, so we need 1200 writing units. Also remember your error codes, in particular what a “provision throughput exceeded exception” is. So it might be said that you are basically using all your read throughputs and all your write throughputs. What error code would you expect to see? The answer is that the throughput of the provision exceeded the exception. So remember that going into the exam. Again, it’s going to be worth a few easy marks.
Now let’s talk about the steps that are taken to authenticate with our web identity providers. So in step one, the user authenticates with the ID provider. This could be Google; this could be Amazon; this could be Facebook. Once they’ve authenticated, they’re passed a token from their ID provider. Your code then needs to call the assumed role with the Web Identity API, provide the provider’s token, and then specify the Amazon resource name for the IAM role. So this is done for the Security Token Service. And then the security token service is basically going to give you back a bunch of credentials and allow you to connect to your instance of DynamoDB. And you can connect for 15 minutes to an hour, with 1 hour being the default. Moving on to conditional and inconditional rights Pretty simple. It just basically says that if a price equals this, then update the price to be that.
But if it doesn’t equal this, then don’t do the update. So it’s what we call indemnity. So you can send the same conditional write request multiple times, but it will have no further effect on that item after the first time DynamoDB performs the specified update. Moving on to atomic counters So atomic counters are supported by aDynamoDB, and that’s basically where you’re doing something like an update item operation to increment or decrement the value of an existing attribute without interfering with other write requests. So we use the example of a web application, and you’re trying to just have a counter on the site, and it’s measuring the number of visitors to the site. So in this case, the application would basically keep incrementing the counter regardless of its current value. So it’s important to remember that atomic counters are not immune. This means that the counter will be incremented each time you call the update item request. So if you suspect a request was unsuccessful, your application could retry the update item operation. However, it’s going to potentially update the counter twice.
So this is probably acceptable for a website counter application because you can tolerate slightly overcounting or undercounting the number of visitors. But if it’s a banking application, it would be safer to use a conditional update rather than an atomic counter. So, again, going into the exam, look at the scenario. Is it crucial that your data be correct? If so, use conditional rights. If not, use atomic counters. And then finally, we just talked about batch operations. So if your application needs to read multiple items, you can use the batch “Get item” API. A single batch get item request can retrieve up to one megof of data, and it can contain as many as 100 items. In addition to this, a single batch of get item requests can also retrieve items from multiple tables. And finally, this is my best advice to you guys. This is the most important section of the exam: DynamoDB. So if you’re going to read one FAQ before going into the exam, go and read the DynamoDB FAQ. It will really pay dividends for you going into the exam. So that’s it for DynamoDB.
It’s a fascinating subject. You do need to know it in depth in order to pass the exam. Like I said before, I am making a course right now that will teach you how to build your own iOS app and use a whole bunch of back-end features, including DynamoDB. I’m hoping that will be live on our platform by the end of June. So check it out when you get a chance. Okay, guys, you’ve been really good. Give yourself a big pattern on the back. This section is over. Apart from VPC, it is the most difficult section of the course. But that’s it. It’s over. So move on to the next section if you’ve got the time. And if you have any questions, please let me know. Thank you.