What's the best practice for handling system-specific information under version control?
I’m new to version control, so I apologize if there is a well-known solution to this. For this problem in particular, I’m using git, but I’m curious about how to deal with this for all version control systems.
I’m developing a web application on a development server. I have defined the absolute path name to the web application (not the document root) in two places. On the production server, this path is different. I’m confused about how to deal with this.
I could either:
- Reconfigure the development server to share the same path as the production
- Edit the two occurrences each time production is updated.
I don’t like #1 because I’d rather keep the application flexible for any future changes. I don’t like #2 because if I start developing on a second development server with a third path, I would have to change this for every commit and update.
What is the best way to handle this? I thought of:
Using custom keywords and variable expansion (such as setting the property $PATH$ in the version control properties and having it expanded in all the files). Git doesn’t support this because it would be a huge performance hit.
Using post-update and pre-commit hooks. Possibly the likely solution for git, but every time I looked at the status, it would report the two files as being changed. Not really clean.
Pulling the path from a config file outside of version control. Then I would have to have the config file in the same location on all servers. Might as well just have the same path to begin with.
Is there an easy way to deal with this? Am I over thinking it?
6 Solutions collect form web for “What's the best practice for handling system-specific information under version control?”
Do not EVER hard-code configuration data like file system paths and force multiple deployments to match. That is the dark side, where there is much SUFFERING.
I find it useful and easy to build my systems to support multiple configurations easily, and I routinely commit configuration files into source control side-by-side, but production’s is obfuscated (no real passwords) and development’s is templated (so a checkout can’t overwrite a developer’s configuration). The code is always packaged in a configuration-neutral manner–the same binary can be deployed anywhere.
Unfortunately, most language/development platforms do not readily support this (unlike Ruby on Rails). Therefore, you have to build it yourself, to varying degrees.
In general, the basic principle is to incorporate indirection into your configuration: specify not the configuration, but how to find the configuration, in your code. And generally invoke several indirections: user-specific, application-specific, machine-specific, environment-specific. Each should be found in a well-defined place/manner, and there should be a very-well-defined precedence among them (usually user over machine over application over environment). You will generally find that every configurable setting has a natural home in one location, but don’t hard-code that dependency into your applications.
I find that it is VERY valuable to design applications to be able to report their configuration, and to verify it. In most cases, a missing or invalid configuration item should result in aborting the application. As much as possible, perform that verification (and abort) at startup = fail fast. Hard-code defaults only when they can reliably be used.
Abstract the configuration access so that most of the application has no idea where it comes from or how it is processed. I prefer to create
Config classes that expose configurable settings as individual properties (strongly typed when relevant), then I “inject” them into application classes via IOC. Do not make all your application classes directly invoke the raw configuration framework of your chosen platform; abstraction is your friend.
In most enterprise-class (Fortune 500) organizations, no one sees the production (or even test) environment configurations except the admin team for that environment. Configuration files are never deployed in a release, they are hand-edited by the admin team. The relevant configuration files certainly never get checked into source control side-by-side with the code. The admin team may use source control, but it is their own private repository. Sarbanes-Oxley and similar regulations also tend to strictly forbid developers from having general access to (near-)production systems or any sensitive configuration data. Be mindful as you design your approach.
You should always separate historization (what a Source Control is for) from deployment.
A deployment involves:
- an identified set of data (for which a tag or label provided by the SCM comes in handy)
- a process manipulating those data (for at least copying them at the right place, but also expanding some compressed files, and so on…)
Amongst the various operation a deployment does, you should include a de-variabilization phase.
A variable is a keyword representing anything likely to change depending on your deployment platform (which can be a PC for continuous integration, a linux for basic homologation, an old Solaris8 for pre-production homologation, and a Full F15K Solaris10 with zones for production: it short it can varies a lot). See Jonathan Leffler’s answer for practical examples.
A variable can represent a path, a JVM version, some JVM settings and so on, and what you are putting in an SCM should be a data with variables in it, never hard-coded settings.
The next step would be to include in your executable a way to detect any change in a setting files in order to update while running some parameters (avoiding the the all “shutdown / change settings / restart” sequence).
That means they are two types of deployment variables:
- static ones (which will never change),
- dynamic ones (which should be ideally taken into account during the runtime session)
Avoid absolute paths wherever possible.
Don’t rely on your current version control to do something magic – you may change version control systems in the future.
The simplest approach works for me: have a ‘config.live’ and the ‘config’ is configured for development. During deployment simply move the config.live to config and all is fine. For more complex configurations a sub-directory for each configuration may be required.
A set of deployment procedures is essential – as the configuration is only one area that will be different.
Anything more complex is almost certainly likely to cause more problems than it solves.
Use an SCM such as Git for version control and a deployment tool such as Capistrano for deployment. Although Capistrano was initially created for Ruby on Rails it’s perfectly fine to use it for other frameworks and languages.
The main thing is that a specific deployment tool will give you all flexibility to automate things like paths on both ends.
I like the way Ruby on Rails deals with this sort of issue – environment-specific configuration files. Rails supports development, test, and production database connections – controlled by configuration in the database.yml file. Here is a blog post about creating other environment-specific configuration options, it is for Rails but might give you some ideas about how to do something similar for your environment. http://usablewebapps.com/2008/09/yaml-and-custom-config-for-rails-projects/
Sounds like your production code is a full on git repository and to update production you do a
git pull? You might want to try a separate build process that checks the code out of your repository and creates a clean build (no .git folder). You could could have environment specific config files which contain your paths that are copied or created along with it.