7. DMS tasks and task assessment reports
Now let’s talk about DMs tasks or the replication tasks. So these let you specify the migration options as we have already seen in the demo. So you can provide table mappings or transformation rules, you can define filters to migrate data selectively and you can also define the migration type whether you want to perform a full load CDC or both. And the DMs task also supports data validation. It provides monitoring options, like you can monitor the task statist tasks, control tables, and it also allows you to reload the table data in case of error. If your DMs task results in an error, then you can reload the table data again after making appropriate changes. And just like SCT assessment reports, DMs also provides task assessment reports.
And these are for premigration assessment. Again, these are optional but recommended. And to run a task assessment, the task must be in a stopped state, it cannot be running. And what it helps you in is it lets you know the potential issues that might occur during a given migration. And this assessment report is not in the PDF format like the SCT assessment report, but this is a simple JSON file containing summary and details of unsupported data types. You can see a sample DMs task assessment report here. So this is like it’s going to show you which of the data types are supported and which of them are partially supported or not supported for the migration.
8. DMS migration types
Now let’s look at different migration types that are supported in DMs. So first off, we have full load. So with full load, you migrate your existing data from source database to target database. So you migrate the entire set of source database to the target. So this is a one time migration from source to target. Then we have full load plus CDC, which migrates migrates your entire set of data and also performs replication of ongoing changes. So it first performs one time migration from source to target and then continues replicating data changes.
And if you just want to replicate data changes without actually migrating existing data, you can use CDC only. So if you have already migrated or performed a full load before, then you can use CDC only option for ongoing replication. So it only replicates data changes from source to target without actually migrating entire set of data. So CDC stands for Change data capture. All right, so that’s about the three migration types we have in GMs. Let’s continue.
9. DMS – Good things to know
Now let’s look at some of the good things to know about DMs. So it’s always recommended to create only the primary keys before full load. The secondary keys and foreign keys should only be created after your full load is complete. And this is because in full load multiple tables are loaded in parallel. And this can create issues if you have foreign key relationships. If a table that has a foreign key relationship is not created together with the connected table, then it might result in issues.
It’s always a good idea to create these relationships later once your full load is complete and the secondary keys again can slow down the full load operations, it’s a good idea to create them later so your full load performance is not affected. And if you’re using ongoing replication, then it’s a good idea to use a multiaz setup for high availability and failure support. Of course. And remember that you can definitely use DMs with Encrypted databases as well. DMs can read and write from or to Encrypted databases.
10. Migrating large tables and LOBs with DMS
Now let’s talk about migrating large tables. So when you have to migrate large tables, it’s a good idea to break the migration into multiple tasks. You can use what’s called as row filtering on a particular key or partition key to create multiple tasks. For example, if you have an integer primary key from one to let’s say 8 million. So you can create eight different tasks using row filtering to migrate 1 million records each. Another example could be if you have a date field as a primary key, then you can partition your data by month using row filtering. And you can use full load tasks to migrate data of previous months and then full load plus CDC to migrate the current month’s data.
So these are different ways you can use to migrate your large tables. Now let’s talk about migrating large objects or lobs and CLOBs. So Lob is a large binary object, CLOB is a character large object and DMs can migrate these large objects as well. So DMs migrates lob data in two phases, all right? So first it creates a new row in the target table and populates all the data except the log data. And in the second phase it updates that row with the log data. So this is how DMs migrates log data. So you have a couple of lob options that you can use. The first one is not to include lob columns. So if you choose this, then your lob data will not be migrated and will be ignored.
Then we have a full lob mode which migrates all your lob data piecewise are in chunks, you provide the chunk size. So in this example, we are using full log mode with a chunk size of 64 KB. So your lobs will be divided into 64 KB chunks and then migrated. And then we have a third option called as limited lob mode which will truncate your lobs to the max lob size. So if you use a lob mode with 32 KB lob size, then any lobs beyond 32 KB will be truncated. Of course this is faster but will truncate your data. So if you already know the maximum lob size in your source database, then you can definitely use the limited lob mode with the particular maximum lob size.
Now let’s look at some best practices for handling these lobs with DMs. So when you use full lob mode the performance is slowest but your data is not truncated. For lobs less than a few MB’s, you should use limited lob mode with max lob size equal to the largest lob in your database. So if you have lob that is just a few MB’s in size, then limited lob mode will provide you with the best performance. But make sure that you allocate enough memory to your replication instance. But if your lobs are larger so if they run into hundreds of MBS, then you should create separate DMs task with full lob mode and ideally separate tasks on a new replication instance.
And the lob chunk size should allow DMs to capture most Lobs in as few chunks as possible. So you should define your lob chunk size such that fewer lob chunks are created. And apart from this, we also have what is called as an inline Lob mode. So this is something only available when you choose the full log mode. All right? So this combines the advantages of both the full log mode and the limited log mode and allows you to migrate your data without truncating any data and without slowing down your task performance. So the way it works is you specify the inline log max size. So to specify this, you have to choose the full log mode and then you will be able to provide the inline lob max size.
So when you use this mode, the way DMs handles it is small lobs are transferred in line and the larger ones are transferred using a source table. Lookup, DMs is going to look up the source table and then transfer larger lobs. And remember that full lob mode must be on to use the inline lob mode. And this is only supported during full full load operation, so you cannot use it for CDC purposes. So these were some of the best practices for handling lobs when you use DMs. All right, so let’s continue to the next lecture.
11. DW migration with SCT
Now let’s talk about the SCT extractors. We use SCT extractors for data warehouse migration. So we have already seen the flow for data warehouse migration. So we have a source data warehouse and we use SCT to migrate schema and move data to S three. And then we move data from S three to redshift. And these are the support data sources. You can migrate from Oracle, SQL Server, Teradata, Natisa, Vertiga and Green Plum. So the SCT extractors are migration agents that are installed locally, or you can also install them on EC Two, but generally closer to the source data warehouse. And these are designed to extract data from your data warehouse in parallel.
They do support encryption and the data is optimized for redshift and it’s first stored in local files. And then once the data is extracted, the data files are loaded into S Three. So you can transfer this data over network, or you can use Snowball Edge as well. And we’re going to talk about this in a bit. And once the data is in S Three, we already know how we can move data from S Three to Redshift. So we simply use the Copy command to load data from S Three into redshift. And we have done a hands on of this Copy command in the redshift section, right? And if you’re using Oracle or SQL Server, you can use DMs to keep your databases in sync with Redshift or the target engine.
So this is a good thing to remember that if you have Oracle or SQL Server, use DMs to keep your databases in sync with redshift. Now let’s look at the data warehouse migration in a little more detail. All right, so you have your on premise data center and the AWS cloud. So your redshift data warehouse is in the AWS cloud, and your source data warehouse is in the on premise data center. So first you migrate your schema using SCT. So you have your SCT installed on premise. You migrate your schema to Redshift and then you use SCT extractors to migrate your data. So SCT extractors are agents that are installed in your on premise data center. They are part of SCT. And you can use multiple hosts to run these extractors. And one host can run multiple extractors as well.
So you can parallelize scale out or scale up for performance gain. And once you have extracted your data using FCT extractors, you can then move that data to S three. So option one is to move that data using an S Three client. You move compressed data files over an S Three client using the network. Or second option is to copy these compressed data files using an S Three adapter to a Snowball Edge device. And you ship this snowball edge device to AWS and then AWS is going to copy data from Snowball Edge to S three. So the Snowball Edge is a physical device that AWS will ship to you, AWS is going to physically ship the device to your premises. You copy the data to it and you send it back or ship it back to AWS.
And AWS will then move this data into S three. And this typically takes about a week or so. All right, so there are a couple of devices that AWS offers in the snowball range, and two devices in particular are useful for data warehouse migration. We have snowball. Edge and snowmobile. So let’s look at both of these. Both of these. The Snowball Edge as well as Snowmobile can be used for data migration. Snowball Edge looks something like this, and Snowmobile looks something like this. Snowball Edge is a handy device. It offers about 100 TB storage. About 80 TB of it is usable, and the data transfer approximately takes about a week.
Okay? And a Snowmobile is a 45 foot long container that will come to your doorstep, and it can store up to 100 petabytes of data. Data transfer can take about six months. And when you look at 100 petabytes of data, six months is still a very small amount of time. If you had to transfer this amount of data over a network, it’s definitely going to take decades. All right, so these are the devices that you can use to migrate huge amount of data from your on premise data centers to AWS cloud. All right, so that’s about data warehouse migration. Let’s continue to the next lecture.
12. Migration playbooks
Alright, now let’s talk a little bit about migration playbooks. So these migration playbooks are a series of step by step guides published by AWS, and they provide best practices and Blueprints for different heterogeneous migrations. And you can access these playbooks from this link AWS Amazon. com DMs resources, or you can also access them from within your DMs dashboard. Okay, so from the home page of your DMs console, you can access these playbooks. All in all, we have SCP for Schema conversion, DMs for data migration, and playbooks for the best practices and Blueprints.
If you access the resources page of your DMs, then you’re going to see playbooks like this. These are the playbooks that are available as of now. You can use these migration playbooks to migrate your Worklets or databases from SQL Server or from Oracle to Aura Engines, aura Postgres or Aura MySQL. And you can also use it to migrate from the Oracle data warehouse to Redshift. All right, if you like, you can definitely go over to this link and review different playbooks. And these playbooks also provide something called as service substitutions. Service substitutions are something that’s highlighted in the playbooks.
And the service substitutions are kind of hacks that you can use to provide for the functionality that you typically natively have in the source database, but it’s not available in the target database. For example, if there are some features Oracle or Microsoft SQL Server supports natively, and those are not supported in Aura MySQL or Aura Postgres SQL, then you can use these sort of hacks or the service substitutions to create or to replicate similar functionality. For example, in SQL Server we have DB Mail functionality to send emails based on certain events. And this feature is not available in the Aura engines like MySQL and Postgres SQL.
What you can do is you can use lambda functions through Aura MySQL along with SNS to emulate the email functionality. Similar substitutions are possible with queues, file management and so on. And you can also use AWS Glue to manage ETL pipelines. Then we can use SCT to convert your Teradata or Oracle scripts and move them into AWS glue by auto generating necessary Python code. So these are different ways or kind of hacks that you can use. And these are highlighted in the playbooks. All right, so that was a quick lecture about playbooks. Let’s continue to the next lecture.
13. DMS monitoring
Now, let’s talk about monitoring DMs. There are a couple of ways you can monitor the progress of your DMs tasks. So you can check the task status in the DMs console, you can use the Tasks Control Table, or you can also use Cloud Watch for this purpose. The DMs task log and service metrics are provided by CloudWatch, and the Task Monitoring tab shows these Cloud Watch metrics. You can access the Task Monitoring tab in your DMs console to see different metrics and statistics that are provided through Cloud Watch service. And you also have the Table Statistics tab that shows the statistics of migration for each table. And we have seen this in the demo where we migrated a simple MySQL table.
And you can also subscribe to event notifications, which are provided through the SNS integration. And as usual, the API calls are logged using Cloud Trail. Now, let’s talk about the task logs. Certain DMs issues, errors, or warnings only appear in the task log. So, for example, any data tankation issues or row rejections that happen due to foreign key violations will only be written to the task log. And to see this task log, you have to enable Cloud Watch logs when you create your replication tasks. So it’s important to remember that this information only available in the task locks. All right, let’s continue.
14. DMS validation
Now let’s talk about DMs validation. So, DMs has the capabilities to validate your migrated data for relational database migrations. So after completing the migration process, DMs can validate whether the source data has been correctly migrated or not. And DMs supports partial Validation of Logs are large binary objects. We enable validation in the DMs task settings, and I had mentioned this in the DMs hands on video. And the way it works is DMs tracks the Progress of the migration and incrementally validates new data as it gets written to the target. And it does so by comparing the source to the target data. And remember that your table must have a primary key or a unique index for validation to work.
So this is important to remember that if your table does not have a primary key or a unique index, then TMS validation will not work. And of course, as mentioned previously in the demo as well, validation requires additional time to complete. So you should always consider doing your validation step in the CDC process instead of doing it in the full load, albeit you can also do it in full load if you like, and only DMs does provide the validation option. SCT extractors do not support validation. So if you are doing data warehouse migration, then validation is not supported, but DMs does provide you the validation options. All Right, so that was about the DMs validation. Let’s continue to the next lecture.
15. DMS statistics and control tables
Now let’s talk about the table statistics. So table statistics tab shows different table states or the performance statistics for each table that’s being migrated. And alternatively you can use the Describe table statistics command to receive the data validation report in JSON format. And it looks something like this. So it will tell you the validation state, how many records have been inserted, how many errors happened, and so on. All right, so that’s about table statistics. Now let’s look at the task statistics. So the task statistics shows the table level statistics for the given DMs tasks.
If you enable data validation, then DMs is going to provide you table level statistics for the given task and it indicates the validation state for each table. And you can of course revalidate tables from the table statistics tab if you like. Any validation errors or diagnostic info is written to a table named AWS DMs underscore validation underscore failures underscore v one at the target endpoint. So after your migration, if you go to your target database and look for this table, then you’re going to see different validation errors and diagnostic information, if any, that was recorded during the migration process.
So for example, to troubleshoot validation errors, you can run this query like select star from AWS DMs validation failures v one where the task name is something. So you simply provide the task name and look for any errors and then you can take appropriate action. And in addition to this, we know that DMs also has control tables that help with DMs migrations. So these control tables provide useful statistics to help you plan and manage your migration tasks. This can be your current migration tasks or future ones. And you create these when creating the replication task.
And we did look at this during the handson, right? So here are some of the control tables that we have. So we have the Apply exceptions table that is enabled by default and it contains the error logs. Then we have the status table which contains the replication status of the current task. We have Suspended Tables, which is the list of suspended tables with reason for suspension. And we have a history table that can contains the replication history along with different statistics. So these control tables are useful for troubleshooting your DMs migration tasks. All right, so that’s about different monitoring and troubleshooting aspects of DMs migration tasks. Let’s continue to the next lecture.