Versioning Mysql Data (Not Just Schema)
There has been talk at my office about creating a package that would version control mysql data (not schemas/migrations).
Basically the process would work like this. Keep in mind the client still uses the backend as normal, image then using it just like a wordpress backend. Client would login select a “branch” give it a name lets say “new users” this would clone a completely new database allowing the user to work on there “branch” without effecting live. Once the client is done making data changes they would merge there data branch into the “master”(live).
Under the hood when merging it would export both live and “new users” branches data to a sql file and do an svn diff and merge the changes.
The situation that arose that enticed the idea was if we have clients that need to make a bunch of changes to there site but dont want to put that data live and while they make changes they dont want to effect other coworkers site changes either. Basically replicated what developers do when working in repositories like Git.
Also if the client works on a dev/demo site the work they do they want to put live.
I wanted to open the discussion to understanding if this is even a good idea?
What problems we may run into?
Is this a good programming practice when working with data?
Does something like this already exist?
2 Solutions collect form web for “Versioning Mysql Data (Not Just Schema)”
Database (especially their data) are rarely stored in a version control system because it doesn’t scale well for large databases.
In your case, if you have not too much data, that could work, especially since a
mysqldump can produce a delimited text format (which has a chance to diff against the previous version)
I would still recommend a separate git repo and a dedicated tool to manage both schema and data changes. For instance, LiquidBase can provide “source control for your database”.
You also have, as a dedicated specialized database: off-scale.
If you were to do this manually, then you have good practices summarized in “Recipes for Continuous Database Integration”.
As mentioned here, even for schema:
I learned the hard way that applying database schema changes cannot be reliably done without a comprehensive step-by-step plan and, similarly, the order of relationship dependencies are important.
Just storing the “current” or “end” schema is not sufficient. There are many changes that cannot be retroactively applied
A->B->Cand some changes
Bmight involve migration logic or corrections.
Is your requirements as simply as following:
- Same back-end code;
- Use master data in live;
- End-users (or group) to work on isolated data;
- No migration, end-users can only modify the data (DML), and are not allowed to modify the schema (DDL);
If these are the requirements, then you can use multiple databases. Consider the following databases in a MySQL server:
These databases share the exact same schema, and only the data is different. In each branch, we have a special table (i.e. dbinfo), to keep track of the parent branch (probably masterdb), create datetime, and other details, like access-level, etc.
You can allow end-users to work on separate branches, by simply allowing them to select a particular database in the UI, where masterdb is selected as default, and is used in LIVE.
- Creating a new branch, would be as simple as cloning a database;
- Merging a newer branch to master, could be handled using the REPLACE statement in MySQL;
If you want to keep track of changes in the data, you can create a special table to log the activities.