Why Data Movement Speed Matters

At Wirekite, we optimized our implementations around an ethos of speed over some other software engineering goals. We don’t use off-the-shelf data formatting libraries or other tools that are commonly used by others in the data movement space, and we aren’t afraid to “reinvent the wheel” if we feel that we need a better wheel.
Speed Matters in Data Migration Projects
In our experience, the faster physical data movement is, the less complex the overall project ends up being, and the higher its chance for success.
For example, consider a very large data migration project from an existing on-premise data and application world to a new data world, whether it’s in “the cloud” or a new on-premise environment. In such a migration, you have a lot of moving parts to consider:
You have to migrate the database schema from the old database technology to the new technology, and figure out how this will be done.
Application data layers will have to be recoded to work with the new dataworld.
You have to physically move existing data from the old dataworld to the new.
As bulk data movement isn’t instantaneous, you will need some scheme for “catching up” the new world with updates from the old world while the initial data was being extracted, transferred, and loaded to the new.
Most organizations with skilled developers and operations people can handle the schema, application, and initial data movement parts, although often with manual processes that may be quite tedious and error-prone. But they usually can make these work - assuming project scope-creep is kept to a minimum.
Where things get icky is in the catch-up phase. If the extract, transfer, and load takes many hours or days as it often does, you have various choices:
A lengthy downtime of your production environment to avoid lag while the base data is extracted, transferred, and loaded to the new environment. You switch your apps over after everything has been loaded (and validated, etc). It’s a “clean” cutover, but may involve hours or days of downtime.
App-coded “catch-up logic”, using app-specific mechanisms, such as copying over a user’s history as a user activity is encountered by the app (and keeping metadata somewhere that says what users have been migrated). This is extremely hard to get right, and the migration may take weeks or months so you’ll have to run two live environments until you finally decide that enough users have been migrated.
Coding your application data layer so it “mirrors” changes to the old and new environment. This is also very hard to get right.
The best scenario is if your data movement solution can do the initial data move as well as replicating both backlogged and new changes from your old to new environment quickly enough that the two environments will be fully “synched” within a reasonable timeframe. But you need a lot of “extra” performance - the change propagation can’t just be a bit faster than the change-rate in prod - to pull this off.
If the last scenario can happen, you can avoid data-related downtime, and just need to “flip a switch” to go from your old app and dataworld to the new world. Other than possibly a blip while the switch is flipped, you won’t have user-facing downtime - and your app developers can focus on the already-nontrivial task of getting the data layer to work in the new world without the transitional app logic or lengthy downtime mentioned in the previous scenarios.
Speed Matters in Data Warehouse Projects
Data warehouse projects are different from data redeployment projects in that the production-facing dataworld will not be shut down, and the data warehouse will be used for reporting and analytics, not customer-facing applications.
But the basic migration problem is similar: you have to get bulk data transferred to the data warehouse, and have some way to keep the data warehouse at least reasonably synchronized with prod.
Many organizations implement strategies such as once-a-week reloads of their entire dataset to their data warehouse. This is expensive, error-prone, and means you may have a several-day lag between current production and your reporting and analytics, which means you may miss emerging trends in your data that are business-interesting.
If you have fast change propagation, your analytics and reporting world may only lag production by a few seconds or minutes, and you can detect trends in your data as soon as they appear.
Speed and Wirekite
Our focus on speed allows Wirekite customers to have simpler - and more successful - engineering processes for data rehosting efforts, and truly fresh and up-to-date data in data warehouses and other analytics environments.
Subscribe to my newsletter
Read articles from Wirekite directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
