How can I diff and patch/merge strings instead of files?

I’m working on a project where people are able to submit stories and have other people contribute. Rather than simply editing an entry in the database, I would like to store the changes people make rather than the entire new set of changes. Then I can dynamically apply diffs if people want to revert to a previous version. I can also easily present users that are Editors with only the modified text so that they can jump right to the changes.

I am aware of how to take diff files and patch other files with them. But I’m making a web app with Python and Django, and I’ll be storing all of these diffs in a MySQL database. Given that performance isn’t a major issue for this app, I am prepared to pull the data from the DB, make files, and run git diff and patch on those files.

  • How to determine last merged branch in git?
  • How to make Git-bash command line start up with home directory?
  • How to disable Git Credential Manager for Windows?
  • git - setting path variable
  • setgid bit not preserved by git on new directory in `.git` folder?
  • git commands not respecting io redirection
  • Is there a better way than building new files and deleting them every time I want to create a new version or apply a new diff? Is there some way to run diffs on straight text instead of files? Eg. setting variables in bash to be the contents of (what would be) a file (but is actually data from the DB), and running git diff on them? I would like to be controlling these actions from a Python file after the user submits a form.

    I’m really just looking for a good way to get started on this problem, so any help would be greatly appreciated.

    Thanks for your time,

    ParagonRG

  • .DS_Store still appears in git status despite being in .gitignore
  • How can I use a git repository in my parent git project?
  • Version control for tickets?
  • How can I get the diff between all the commits that occurred between two dates with Git?
  • env: bash\r: No such file or directory
  • Using git, how do you reset the working tree to the state of the index?
  • 2 Solutions collect form web for “How can I diff and patch/merge strings instead of files?”

    I have done quite a bit of searching for a solution for this. Python’s difflib is fairly legit, but unfortunately it tends to require that the diff strings contain the entire original strings with records of what was changed. This differs from, say, a git diff, where you only see what was changed and some extra context. difflib also provides a function called unified_diff which does indeed provide a shorter diff, but it doesn’t provide a function for rebuilding a string from a string and a diff. Eg. if I made a diff out of text1 and text2, called diff1, then I couldn’t generate text2 out of text1 and diff1.

    I have therefore made a simple Python module that allows for strings to be rebuilt, both forwards and backwards, from a single string and its related diffs. It’s called merge_in_memory, and can be found at https://github.com/danielmoniz/merge_in_memory. Simply pull the repository and run the setup.py.

    A simple example of its usage:

    import merge_in_memory as mim_module
    
    str1 = """line 1
    line 2"""
    str2 = """line 1
    line 2 changed"""
    
    merger = mim_module.Merger()
    print merger.diff_make(str1, str2)
    

    This will output:

    --- 
    +++ 
    @@ -1,2 +1,2 @@
     line 1
    -line 2
    +line 2 changed
    

    diffs are simply strings (rather tan generators, as they are when using difflib).You can create a number of diffs and apply them at once (ie. fast-forward through a history or track back) with the diff_apply_bulk() function.

    To reverse into the history, simply ensure that the reverse attribute is set to True when calling either diff_bulk() or diff_apply_bulk. For example:

    merge = self.inline_merge.diff_apply_bulk(text3, [diff1, diff2], reverse=True)
    

    If you start with text1 and generated text2 and text3 with diff1 and diff2, then text1 is rebuilt with the above line of code. Note that the list of diffs are still in ascending order. A ‘merge’, ie. applying a diff to a string, is itself a string.

    All of this allows me to store diffs in the database as simple VARCHARs (or what-have-you). I can pull them out in order and apply them in either direction to generate the text I want, as long as I have a starting point.

    Please feel free to leave any comments about this, as it is my first Python module.

    Thanks,

    ParagonRG

    Have a look at libgit. It is a C (and every other language) interface that lets you manipulate a git repository in various ways.

    It seems pretty low-level so getting it to actually commit, diff and so on might be tedious, but it does at least have a function to add a blob to the repo without it needing to be on disk.

    The alternative of course is to create a normal file-based repository and working copy and bounce stuff back and forth between the database and file system using os.system calls.

    Git Baby is a git and github fan, let's start git clone.