Generic version control strategy for select table data within a heavily normalized database

Sorry for the long winded title, but the requirement/problem is rather specific.

With reference to the following sample (but very simplified) structure (in psuedo SQL), I hope to explain it a bit better.

  • How can I get all the data types specific for a certain version of MS Access or/and every versions of MS Access?
  • Database structure and source control - best practice
  • What are the best practices for database scripts under code control
  • Database Schema Migration on Azure with Git Deployment
  • How do you maintain revision control of your database structure?
  • How to handle multiple db alter scripts coming from different Git feature branches?
  • TABLE StructureName {
      Id GUID PK,
      Name varchar(50) NOT NULL
    }
    
    TABLE Structure {
      Id GUID PK,
      ParentId GUID,                 -- FK to Structure
      NameId GUID NOT NULL           -- FK to StructureName
    }
    
    TABLE Something {
      Id GUID PK,
      RootStructureId GUID NOT NULL  -- FK to Structure
    }
    

    As one can see, Structure is a simple tree structure (not worried about ordering of children for the problem). StructureName is a simplification of a translation system. Finally ‘Something’ is simply something referencing the tree’s root structure.

    This is just one of many tables that need to be versioned, but this one serves as a good example for most cases.

    There is a requirement to version to any changes to the name and/or the tree ‘layout’ of the Structure table. Previous versions should always be available.

    There seems to be a few possibilities to tackle this issue, like copying the entire structure, but most approaches causes one to ‘loose’ referential integrity. Example if one followed this approach, one would have to make a duplicate of the ‘Something’ record, given that the root structure will be a new record, and have a new ID.

    Other avenues of possible solutions are looking into how Wiki’s handle this or go a lot further and look how proper version control systems work.

    Currently, I feel a bit clueless how to proceed on this in a generic way.

    Any ideas will be greatly appreciated.

    Thanks

    leppie

  • Using Git for collaboration on a 4 person project
  • Restore uncommitted work after checkout and reset
  • How do I copy part of a commit diff in GitHub without the pluses and the minuses?
  • Which Version Control for Drupal?
  • Is it possible to have a tracked .gitignore AND an untracked .gitignore?
  • Branch/change not merged, though Gerrit claims, it is
  • 2 Solutions collect form web for “Generic version control strategy for select table data within a heavily normalized database”

    Some quick ideas:

    Full copy: Create a copy of the structure, but for every table add a version_id column to the PK and all FKs; thus you can create copies of the life data with complete referential integrity.

    • pro: easy to query the history
    • con: large amount of (redundant data copied)

    Change copy: Only copy the stuff that actually changes, along with valid_from / valid_to data.

    • pro: low data volum copied
    • con: hard to query, because one has to join on intervals

    Variation: This applies to both schemes. Instead of creating a copy of the structure, you might keept the current record in the same table as the old versions, but tag it as current.

    • pro: smaller number of tables, easier mixing of history and current information
    • con: normal operation operates on much bigger tables, which will cause a performance impact

    Auditing log: Depending on your actual requirements it be sufficient to just create an audit trail like this:

    id,  timestamp,  changed_table,  changed_column,  old_value,  new_value,  changed_by
    

    You might extend that to a full table structure:

    transaction,  table_change,  changed_column
    
    • pro: generic, hence easy to implement for a large number of tables
    • con: if you need to reconstruct the state of a set of records at a given time, querying will become a nightmare

    I wrote a blog about various approaches to versioning, but be warned: it’s in German.

    The data warehousing folks have several algorithms for “slowly-changing dimensions”.

    The more sophisticated algorithms provide data ranges around a dimension value to indicate when it’s valid.

    Depending on your versioning requirements you could do one of these things, cribbed from Kimball’s The Data Warehousing Toolkit.

    1. Assign a version number to rows of the structure table. This means you have to do some reasoning to collect a a complete structure. It includes the selected version number unioned with rows that are unchanged in an earlier version.

    2. Assign a date range or version range to rows of the structure table. This means that some rows have start dates and end dates; some rows will have end dates at some epoch in the impossible future. Or, if you use version numbers, you’ll have a start-end pair or a start-infinity pair that indicates this row is still current. You can then trivially query the rows that are valid “today” or apply to the requested version.

    3. Clone the structure for each version. This unpleasant because the clone operation is costly. The queries however, are trivial because the entire structure is available with a single, consistent version number.

    Git Baby is a git and github fan, let's start git clone.