6. Neptune Workbench
Now, Neptune also supports what’s called as Neptune workbench that allows you to query your Neptune cluster using notebooks. So these notebooks are nothing but Jupiter notebooks, and these notebooks are hosted by Amazon Sage Maker platform, and you can access them from within your AWS console. These notebooks behind the scenes run on an EC to instance, hosted in the same VPC as your Neptune cluster, and it should have appropriate Im authentication. The security group that you attach in the VPC where your Neptune cluster is running must have an additional rule that allows inbound connections from itself. And this kind of makes sense, right? All right, so let’s let’s go into a demo and see how to query Neptune data using your Jupyter notebooks.
7. Querying Neptune via Jupyter Notebooks – Hands on
In this demo. Let’s find out how to use Jupyter notebooks or the notebooks from Sage Maker to query our Neptune data sitting in the Neptune clusters. So when we created our Neptune cluster using the Cloud formation template, it also created a notebook. So here I am in the Neptune console, and this is the cluster that we created using the Cloud formation stack. Let’s navigate to notebooks from here. And this was the notebook that got created, so let’s open it. All right, AWS provides us with some sample notebooks here, so we can navigate to the Neptune folder and getting started. And here we have some four notebooks that you can explore. Let’s look at one of them, say the first one, and you can use this for learning purposes as well.
So, for example, this query person status is going to let us know the status of our cluster. So remember, this notebook is connected with our Neptune cluster that we created. If you select this line and run this, it’s going to show us the status of our cluster. Similarly, you can run different commands. This is a sparkle query, so this is inserting this sparkle data. So if you select it and run, it’s going to load sparkle data into our cluster. And remember, when we loaded data previously, we used Kremlin and now we used sparkle to load data. Both the data sets are stored separately, so we can query the data that we just added. Let’s run this. And here is the result of the query. And similarly, we also have Gremlin queries here.
So I’m not going to run this command because we already have Grumbling data in our database. What we can do is we can run this vertices command. So let’s run this, and it should give us the data that we previously added. So you can see that these nine vertices that we added from S Three show up here, right? And you can also use Value Map to look at the details of those vertices, so it will show us all the properties. Now, we only added one property, but you can have multiple properties for your vertices. All right, all right, let’s go back. And you can also create new notebooks from here. So from the new menu, you can choose Python Three. Maybe you can name the notebook my Neptune notebook.
And here you can run different commands. For example, we can look at the status of the cluster and it will show us that. Then if you want to run some Gremlin commands, for example, you can do Gramlin followed by GV, and it shows us all the word ISIS. Similarly, you could do G. We Riaz out Friend and it’s going to show us the Friends of Riyas. So that’s about it. You can go ahead and experiment with Jupyter notebooks. It’s fairly easy. So I hope this gives you an idea of how Neptune works. What is graph data? How to query graph data, using your EC to console or using Jupyter. So that’s about it. And before we close, let’s delete our Cloud Formation stack so it removes all the Neptune resources that we created. So head over to your Cloud Formation stack, locate the stack, and delete it. All right, that’s about it. Let’s continue to the next lecture.
8. Neptune replication and high availability
Now let’s talk about Neptune replication. So this is exactly same as in aura. So you can have up to 15 Replicas and these Replicas use asynchronous replication and they share the same underlying storage layer as your master instance. And this storage layer is spread across multiple az’s. And these Replicas typically take aura application lag of tens of milliseconds and the replication process itself does not impact a lot on your primary instance. There is minimal performance impact on the primary due to the replication process. And Replicas also double up failure targets. And we don’t have a concept of standby instances like we have with RDS. All right, so this is about Neptune replication.
Let’s look at Neptune high availability. Again, this is same as Aurora. Failovers occur automatically and a Replica is automatically promoted to be the new primary in case of outage. And Neptune flips the CNAME of the database instance to point to the Replica and then promotes that replica failover to Replica typically takes about 30 to 120 seconds. It’s a minimal downtime. And remember that creating a new instance would take about 15 minutes post failovers. If you do not have any Replicas to fail over till then, creating a new instance is going to take a while and fail over to a new instance happens on a bet effort basis and can take longer. All right, so that’s about it. Let’s continue to the next lecture.
9. Neptune backup and restore
Now, let’s talk about backup and restore options with Neptune. So, these are same as you have with RDS. So Neptune supports automatic backups, and it continuously backs up your data to S three for Pitr or point in time recovery. The maximum retention period is about 35 days. And the latest restorable time for Pitr or for power, point in time recovery can be up to five minutes in the past. And this is because Neptune uploads logs to s three every five minutes. So you have an RPO of five minutes, which means you can lose maximum of five minutes worth of data. The first backup is always a full backup, while the subsequent backups are incremental.
And as mentioned earlier, automatic backups can be retained up to maximum of 35 days. And you can use manual snapshots if you want to retain these backups beyond 35 days. And the backup process does not impact your cluster performance, because we already know that the storage layer in Neptune is different from the cluster instance, and backups are always taken from the storage layer. Further, you can only restore to a new cluster. You can restore an unencrypted snapshot to an encrypted cluster, but not the other way around. So you have your manual snapshot, you enable encryption at rest, and then you restore that to your cluster, to a new cluster.
And to restore a cluster from an encrypted snapshot, you must have access to the Kms key that was used for encryption originally. And remember, you can only share manual snapshots if you want to share automated once, then you have to copy the automated snapshot to a manual snapshot, and then you can share that manual snapshot. And also, you cannot share a snapshot that was encrypted using the default Kms key of your account. You can share snapshots across accounts, but they have to be within the same region. So that’s about the backup and restore options in Neptune. Let’s continue to the next lecture.
10. Database cloning in Neptune
Now let’s look at database cloning in Neptune. Again, this is exactly same like the cloning we have in aurora clusters. Cloning is different from read replicas as clones support both reads as well as writes. Cloning is also different from replicating your cluster because clones use the same storage layer as your source cluster cluster. And because they use the same storage layer as your source cluster, clones require only minimal additional storage. So this is quick and cost effective. And remember, you can create clones only within the same region as your source cluster, but they can be in a different VPC.
You can create clones from existing clones and the clones is what’s called as copy on write protocol. So you have your source data and you have your clone. And initially, both source and clone share the same data. And any changes that happen either on the source or on the clone after creating the clone, those changes are stored separately from the original set of data. For example, if there is a change on the source, then that data is stored separately. Similarly, if there is a change on the clone data, then that data is also stored separately. And this delta of rights after cloning is not shared between source and the clone. All right.
11. Neptune security
Now, let’s look at Neptune security first, the IAM. So, we use Im for authentication and authorization to manage Neptune resources. Neptune supports I am authentication, and for that, we use AWS SIGV Four. So, SIGV Four is something that you use to sign your API requests with the user credentials or with the temporary credentials. So you use temporary credentials using an assumed role. So you typically create an Im role, and then you set up the trust relationship. So, for example, if you’re using EC Two or lambda, these services can assume the Im role on your behalf and then connect to your Neptune cluster.
So you simply retrieve the temporary credentials from Sts and then use SIGV for algorithm or SIGV for process to sign your API requests, and then you can send those requests to your Neptune cluster. All right, so that’s about the IAM. Now let’s look at encryption and network. So, Neptune supports encryption in transit using SSL or TLS. And for this, you have to set the cluster parameter neptune enforce SSL equal to one, and this is set to one by default. Neptune also supports encryption at rest with AES 256 bit algorithm using Kms.
So this encrypts your data automated backups snapshots, as well as your replicas in the same cluster. And remember that Neptune clusters are VPC only, so you typically use private subnets to host your Neptune clusters. And the client applications can run on an EC Two instance in public subnets within VPC. And if you want to connect from your on premise infrastructure, you can use VPN for that purpose. Just like any other AWS services, you use security groups to control access. All right, so that’s about Neptune security. Let’s continue to the next lecture.
12. Neptune monitoring
Now, let’s look at monitoring aspects of Neptune. So you can use the Health Status API using this endpoint slash status to monitor the health of your cluster, and other monitoring metrics are provided by Cloud Watch. As usual, you can use audit log files by enabling the TB cluster parameter. Neptune enabled audit log. And audit logs are something that you use to monitor your database activity, like logins, failed logins, different queries that are performed, and so on. And you have to restart your database cluster after you enable these audit logs.
And remember, any audit logs beyond 100 MB will be rotated by default, and this is not configurable. And these audit logs are also not stored in a sequential order, but you can order them using the timestamp value of each record. And if you want this audit log data to be exported to Cloud Watch logs, then you can modify your cluster and enable log exports for the audit logs, just like you would do with any RDS engine. And just like any other AWS service, all API calls are locked with no points for guessing it’s. Cloud Trail all right, let’s continue.
13. Query queuing in Neptune
Now, let’s talk about query queuing in Neptune. So a maximum of 8192 queries can be queued up per Neptune instance. And beyond this, the cluster is going to throw a throttling exception. So you can use this Cloud Watch metric main request queue painting requests to get the number of queries queued. And this has five minute granularity. You don’t have to remember the name of this metric, but remember that you can queue up to 8192 queries at a time. And you can monitor this using Cloud Watch.
Additionally, you can use accepted Query count value using the Query Status API. For Gremlin accept, query Count contains the current count of queries queued. And for sparkle, it contains all the queries that were accepted since the server started. And remember, you don’t have to remember the names of these metrics or metrics. Just remember that maximum 8192 queries can be queued up at a time. And you can use Cloud Watch metrics to monitor this number. Alright, let’s continue.
14. Neptune service errors
Now, let’s look at some of the service errors that you might find with Neptune. So, first off, you have graphengine errors. So these are errors related to your cluster endpoints and are typically Http error codes. Then you can have query errors like query limit, exception, memory limit exceeded exception, too many requests, exception, and so on. And these are kind of self explanatory, right? You can also have im authentication errors like missing authentication, missing token, invalid signature, and so on. Again, these are self explanatory. Then you have API errors.
So these are Http errors related to your APIs, and you would see them when you use the CLI or SDK to talk to Neptune. Some of the examples are internal failure, access denied, exception, malform, query string, service unavailable, and so on. And if you’re loading data into Neptune using the loader end point, you might see some of these errors like load not started, load failed, load s, three read error, load data deadlock, etc. So, for example, let’s say you are loading Gremlin data and your CSV file is not formatted correctly. Then when you run the bulk load command, you might see a load failed error. All right, so that’s about Neptune service errors. Let’s continue to the next lecture.
15. SPARQL federated query
Now, let’s talk about Sparkle Fed rated query. So you use this Federated query to query multiple Neptune clusters or external data sources supporting this protocol, and it will return you the aggregated results. And it only supports read operations. So you have your client and your different Neptune clusters and your client EC Two instance. So you send your Federated query, let’s say Q One plus Q Two to your EC Two instance. Then it will pass on that query to the first Neptune cluster. So query Q One plus Q Two goes to Neptune cluster One and the Neptune cluster one will execute query one and pass on the query two to Neptune cluster two.
Then Neptune cluster Two will return the results to Neptune cluster one. Let’s say they are R two. And Neptune cluster One will then combine or aggregate its results with the results received from Neptune cluster two. So R one plus R two will be sent back to the EC. To instance and then from EC to instance, it will be sent back to the calling application. So this is how Federated query works. You can use Neptune clusters or any external data sources that support this Federated query protocol. All right, so that’s about Sparkle Federated query. Let’s continue to the next lecture.
16. Neptune streams
Now let’s talk about Neptune streams. Streams capture changes to your graph data. So these are kind of change logs. They are similar to your DynamoDB streams and you can process them with lambda functions by using the Neptune streams API. So if you’re using sparkle, the endpoint is sparkle slash stream and for gremlin is similar gramlinstream. Only get method is allowed when you use streams. Some of the use cases of streams could be a for example, you can integrate this with elasticsearch. Your Neptune data is relate to a Neptune stream which can be consumed by a lambda function which can update the data into your Elastic search service.
And elastic search service can then allow you to perform full text search queries on your Neptune data. So this typically uses a combination of streams and federated queries and it’s supported for both gramlin as well as for sparkle. And you can also use Neptune streams for application purpose. For example, you can perform cluster to cluster replication using Neptune streams. So you can perform replication from one Neptune cluster to another Neptune cluster. All right, so that was about Neptune streams. Let’s continue to the next lecture.
17. Neptune pricing
Now let’s talk about Neptune pricing. With Neptune, you only pay for what you use. You pay for on demand instances on a per hour pricing, you pay for Ioffs per million IO requests. Every database page read operation in Neptune is counted as one IO. And each page is about 16 KB in Neptune. And write iOS are counted mounted in four KB units. In addition to IOPS, you also pay for database storage on a per GB per month basis. You also pay for backups for both automated as well as manual backups on per GB per month basis. And just like any other AWS service, you pay per GB for data transfer as well. And if you’re using Neptune Workbench Jupyter notebooks, you pay for the instance that you run your notebooks on. So you pay per instance hour for the EC to instance that runs your notebooks. So that’s all about Neptune. And this is also the end of this section. Let’s continue to the next section.