How to turn an obstacle into the opportunity of building a reusable toolWe'll discuss the architecture of a Data Migration Manager built with Outsystems technology. this project was inspired by the urgent need to move data between DB in different environments. Later we'll find out that the platform can't easily help with that.
The suggested design and algorithms can be improved and some choises depended by external factors.
However we built a generic and effective tool, altough limited not efficient because of REST services tecnhology.
But Outsystems helped a lot in keeping costs down and in accelerating development.
What is Outsystem?
“ OutSystems is a low-code platform that lets you visually develop your entire application, easily integrate with existing systems, and add your own custom code when you need it.”This is the home page definition. We add that platform is a Paas served by AWS. This could help a legacy modernization process.
Moreover tha platform has a feedback system, code management tools, release and dependency management tools. This is why Outsystems is a strongly DevOps enabling platform, natively oriented to continuos integration and countinuos delivery.
What it's for?Outsystems is for visual development. It lets you code very very fast, without coding. Custom .Net code can be added when needed with extensions.
It is also a complete lifecycle manager: version control, release and dependency management and even database release automation (i.e. Liquibase or DBmaestro), still a touchy point.
So, in few minutes we are able to build an application from scratch and to release it in production environment.
But what if I need to move datas between environments?
Outsystems (in cloud version) does not natively allow data migration.
In a perfect world, in a correct development process flow, we should not need this. But in real world, honestly, it may happen...
Friends don’t let friends manage serversIt's an AWS saying. Indeed we can't access the DB using administration tools: too risky. We can do it using a VPN. Outsystems will support you in doing this.
But we have two more ways:
- Infosistema Data Migration Manager it's a commercial solution aimed to solve this problem. It also allows to apply data transformations (ETL).
- Punk way do it yourself: a Data Migration Manager home made.
Self producer-self consumer modelWe start from two points:
- Outsystems environments are publicly exposed on the internet.
- Applications may expose and use services.
Better: A dev-instance can extract its datas and pass them to the test-instance using REST services. A model that we could call self producer - self consumer.
We can work on it. If Infosistema can do it, there must be a way 😊
Design could be:
Using this approach every application must have migration logic. Its code will be replicated n times. We can do better.
We can encapsulate migration logic in a Data Migration Tool (we are going to call it GearDMT).
Outsystems, aiming to reuse, lets us to define “public” tables and import and use them inside other modules.
So every application will define “public” its tables and GearDMT will import them as dependencies.
Pros: generic solution, migration logic isolated.
Cons: inelegant, violates several patterns and programming principles.
Design would become:
Other techincal issues:
- Every time we have to migrate a new table we will have to add a dependency in GearDMT and republish both applications.
- GearDMT needs to know the structure of the tables to migrate. So it will have to import also some system tables (Entity, Entity_Attr, …).
- GearDMT needs to run generic and dinamyc querie and it's impossible to write such queries using platform base tools.
It's time to get our hands dirtyWe are lucky: we can integrate custom code using .Net extensions. So we can write a custom extension to exploit platform Runtime Public DB API to talk to data layer.
- Extension is engaged while reading and writing datas;
- It easily can run dynamic and generic queries involving unknown tables infering data types;
- It has low-level access to DB and to all platform's tables;
- It introduces a decoupling;
- It manages transactions in autonomy.
We will use also LifeTime Services API to get from platform informations about environments and installed applications.
REST layer is necessary to allow dialog between application instances. Furthermore it can be used to populate tables from external sources.
Of course we have to keep in mind that services must be secured: they are a doorway to our database.
The processOnce the design is defined, process is very easy. Here are main steps:
Two main steps:
- Verifiy if we can migrate and then load datas (CheckEnvAndLoadSourceTable)
- Run migration (MigrateTable)
CheckEnvAndLoadSourceTableFirst detailed step:
- verify that GearDMT is installed also on target environment (using REST ping service)
- source table must be present on target environment. Source and target table structures must be identical (checkDestTable will compare hashes of json objects representing table structures)
- load datas from source table using readTable. This method will call low level methods exposed by .Net extension
2. migrateTableLater in the second step we call migrateTable to run migration. It calls method postData in target environment passing table structures and loaded datas.
postData will call again the .Net extension's low level methods to write on DB.
This approach has obvious limitrations: datas are passed entirely in a single HTTP POST request. We risk timeout for big tables (think about big files in blob columns).
It's true: we can migrate row by row. But this won't help so much: we risk a huge amount of almost concurrent HTTP requests. We can fix this if we introduce a synchronized request-response management: we start a new request only if the previous one is completed. Anyway we can't have an effective transaction management: every row would be a single transaction and we can't execute a complete rollback. Not so nice.
So, all in all, first approach seems to be the lesser evil.
Id matching strategiesOne of the data migration goals is the Id matching warranty: we want that record r with id x in source env has a corresponding record r’ identical to r with id x in target env. This is not trivial because Outsystems uses autoincrement for primary keys.
We can disable autoincrement, to insert datas, and enable autoincrement again but unfortunately Outsystems db user has no rights to perform such operations.
So a strategy to fix this issue could be “Insert-delete until match”
1 order records to copy by id ascending 2 write first record 3 read id assigned in target environment 4 while target id < source id 5 delete just inserted record 6 insert the record again 7 if target id > source id throw an exception and rollbackIt's somehow a ragged inelegant solution, but it works.
Let's implement also a “Don’t care” strategy. It will allow to insert records ignoring primary key values. This strategy will be useful for easily testing and debugging GearDMT , and we can use it in future to feed tables from external sources.
Id resolving for static entitiesIn Outsystems you can create normal tables, called (Entities) and Static Entity. A Static Entity is an entity that has static data associated to it. This static data is managed at design time and can be used directly in the business logic design of the application, benefiting from strong typing. Think of static entities as enums in Java. Outsystems cares about static entities data migration while you promote an application from an environment to another.
But primary key may not be mantained.
Let's suppose we have FileType static entity in dev
At design time we realize that we won't ever need "audio" file type and we decide to delete it. The static entity will look like this:
When we'll promote to test env, records will be generated like this:
So if we are going to use GearDMT to move all other datas from dev to test, all "immagine" file type files will become “video”. Moreover inserting “video” file type files would cause a reference key violation.
The only way to fix this issue is trying to resolve the record's Id in the target environment using Label attribute.
This may be a record of File entity in development:
Let's look in target enviroment for the Id of the filetype having label = “immagine”. Result would be Id = 2. And finally we can write the record:
It's a weak strategy. But we have no other ways.
Limits and preconditionsGearDMT is a Data Migration Tool built to quickly meet our requirements at an extremely low cost. It does not pretend to be a DMM, it is limited and may be improved. It can't be considered a reference architecture.
In order to function properly some preconditions must be met:
- A standard primary key (just one column called Id autoincrement)
- If insert-delete strategy is choosed, target table must be empty and its Id counter must be reset.
Application server bears the computational cost of whole migration. This cost may be too high and may impact on all other applications running on same server causing delays or service outages.
All datas are moved by REST services using an HTTP POST request.
Doing a POST request for each record to move could be an option. But we wouldn't be able to manage effectively the transaction: in case of exception just last record would be affected by rollback instead of the whole dataset. But first option has performance issues and is exposed to timeout errors in case of moving a huge amout of datas.
Lastly, Static Entities Id resolution, based on label value, is clearly exposed to failures. Even if it would be really weird to have in a Static Entity two or more records having same label value.
Evolutions and improvementsGearDMT development has been an opportunity to deepen Outsystems knowledge. We looked into the hood and customized the platform. We succesfully used the tool in several cases to migrate datas.
There's still scope to improve the tool:
- We could exploit HTTP 102 Processing response to mitigate client timeout risks while moving a huge data set.
- We could introduce an ETL module to apply data transformations. If we design it extendable by plugins we will be able to easily meet unpredictable future requirements.
- Lastly we can call GearDMT services from any other application. In a scenario of legacy modernization, if we are quickly re-engineering an obsolete application using Outsystems, we'll be able to use GearDMT services to feed the new database from a batch or another external application.