What Is Data Migration?
Data migration is a process designed and implemented for the purpose of transferring data between storage systems, computer systems or data formats. Organizations migrate data for various reasons. For example, when upgrading or replacing storage equipment or servers.
This process is also implemented when shifting data over to third-party cloud providers, when consolidating websites, performing infrastructure maintenance and upgrading software. It is also used during data center relocation and company mergers.
In this article, I’ll explain the basic concepts of migrating datasets and present a general process for migrating your data. Finally, I’ll show a practical example – how to migrate data from an on-premises data center to the Azure cloud.
Types of Data Migration
Here are several data migration types:
- Storage migration—involves the transfer of data from a certain storage repository to another. Storage migration can take place on-premises or in cloud environments. It is generally considered the most straightforward migration, but still requires a solid migration strategy. The target of storage migration in the cloud may be object storage, file storage, or block storage.
- Database migration—the process of upgrading a database engine. It involves shifting an entire installed database of files into a new device. A database migration is typically more difficult to achieve than a storage migration. This is because you need to shift higher volumes of data, which might be formatted differently. It requires backing up your databases, detaching each database from the engine, and then migrating the files to the new engine. You can then restore the files to the new location and database.
- Application migration—the process of shifting an entire software application from one location to another. This includes all application components, including databases, folders and installation files. It may require a combination of storage and database migrations. It may also require working with the application vendor to ensure the application functions properly post-migration.
Data Migration Strategies
Here are two commonly used data migration strategies.
Big Bang Migration
A big bang migration happens as a single event, during which live systems experience downtime. This happens because data goes through extract/transform/load (ETL) processing and moves to the new target database. This approach is usually associated with a greater level of risk if a failure occurs.
Trickle Migration
A trickle migration is designed in phases, during which old and new systems run concurrently. This model eliminates operational interruptions and downtime. However, a trickle migration is often complex and time-consuming, which is why it requires careful planning.
Data Migration Process
The data migration process involves the following phases.
Planning
Start with evaluating your data assets and building a migration plan. You can break down the planning process into these steps:
- Narrow the scope of migration—define the most critical data you need to migrate and filter out the excess. Consult with data users impacted by any proposed changes.
- Assess the source and target systems—analyse the source system and the target system to evaluate how you can adapt the operational requirements of your current system to the new environment.
- Set data standards—this is important for detecting issues during and after migration.
- Set a timeline and budget—estimate the time and resources required to implement the migration process (this will differ according to the approach and scope of migration) and set deadlines and budgets.
Migration Design
Specify the rules, roles and acceptance criteria for migration and testing. Consider using data migration tools like Extract Transform and Load (ETL), and hire specialized developers or engineers to operate them. For unstructured data, consider treating data to reduce its size and complexity before migration – for example using image compression algorithms, video APIs or minification for text-based data.
This phase involves creating data transition scripts and data mapping. A variety of developers, data engineers and business analysts may collaborate on the migration design. It is important to ensure the necessary tools are in place.
Migrating and Testing
Execute the migration and implement continuous testing. You can use a big bang approach to migration, which takes a day or two to transfer all the data but requires downtime, or a trickle approach, which is much slower but eliminates downtime and reduces the risk of critical failure.
Regardless of the approach you implement, testing should be baked into the design and execution of your migration. In a trickle approach, each portion of data should be tested when it is migrated, to ensure problems are fixed early. Apply frequent tests to ensure the retention of data integrity, both during and after the migration.
Real Life Example: Migrating Unstructured Data to Azure Storage
Microsoft Azure is the world’s second-largest cloud provider. Azure provides extensive migration resources that can help automate migration for both data and workloads – especially if your on-premises systems are based on Windows or Windows Server. Let’s see what the process looks like when migrating a large-scale, unstructured dataset to Azure storage.
Discovery
This phase involves determining which sources you need to migrate. For example, you might decide to migrate your SMB shares, object namespaces or NFS exports. You can implement this phase manually or leverage automated tools.
Assessment
This phase can help you understand which migration options are suitable for your scenario. Here are several steps you can implement for the assessment phase:
- Choose a target storage service—the storage type you choose should suit the application and users accessing the data. To determine this aspect, you should do two types of assessments—a technical assessment to determine possible targets and a financial assessment to determine a cost-effective option.
- Select a migration method—Azure lets you choose between online and offline migration. The online migration method consumes network bandwidth when migrating data—either a public Internet connection or Azure ExpressRoute. Services without a public endpoint need to use a VPN over a public Internet connection. Offline migration requires the use of Azure Data Box devices. You can also combine the two methods.
- Choose a suitable migration tool—you can choose from a wide range of third-party, first-party and open-source migration tools. Notable open-source options include AzCopy, xcopy, rsync and robocopy. You can combine native Azure Files capabilities with Azure File Sync to migrate from Windows file servers to Azure Files. Complex migrations might require the use of commercial tools.
Migration
This phase involves moving the data. To accomplish an easier switchover, you should implement the migration phase several times. Here are the main steps of the migration phase:
- Initial migration—migrating all data from the source to a specific target. Typically, this step is responsible for moving the bulk of the data chosen for migration.
- Resync—migrating data that was changed after the initial migration step. If there are several changes, you can repeat this step multiple times. Running multiple resync operations can help you reduce the overall time it takes to complete the final step.
- Final switchover—switching active usage from the source to the target data and then retiring the source.
Aspects that impact the duration of the migration for unstructured data include:
- The total size of the data—the migration phase increases or decreases with the total.
- File size distribution—the migration time increases or decreases with the average file size. To reduce the total migration time, you can archive small files within larger files. For example, .zip or .tar files.
Conclusion
In this article, I explained the basics of migrating storage to the cloud, which requires an understanding of data migration type, data strategies and the overall data migration process. I also provided an example of data migration phases, including:
- Discovery phase—determining exactly which sources you want to migrate.
- Assessment phase—choosing a suitable target storage option, migration method, and migration tool.
- Migration phase—moving most data during the initial migrating, cycling though multiple resync operations, and implementing the final switchover.
I hope this will be of help as you plan an efficient cloud migration that helps you smoothly and successfully move your data to the cloud.
Discover more about data migration in this video with Piet Van Dongen