Friday, June 19, 2009

Team Foundation Server: Some thoughts on source control branching strategies

I thought I'd just write a short note on some of the source control branching strategies employed and how these relate to how I might use my preferred source control repository, Team Foundation Server (TFS).

I have been working with a customer who are using the IBM Rational toolset for just about everything and I have been ranting about how bad it is and how great TFS is, but one of the things that has given me cause to think is the way that the customer uses the ClearCase source control for branching and merging.

This has echoes with another customer I worked with last year (very large UK bank) that was using Harvest - now a CA product - for their source control. This was also working in a similar way and I was considering how this contrasted with the way that we usually use TFS.

Streaming Source Control Model

The first thing that I would say is that TFS is tuned for developer productivity whereas IBM Rational ClearCase is tuned for control. This in itself makes for some interesting differences, which I may expound on in a later blog post. The next immediate observation is that they handle changesets differently. This is because Harvest and ClearCase use a hierarchial branching structure. Typically, you might have a series of environments that your code is going to progress through, and you might have different builds of code at each of these stages. You therefore create a branch that represents each of these stages and order them in a hierarchy. At the top of the stack you have your current production code, then you might have QA, then maybe System Integration, then Continuous Integration and at the bottom of the stack you have a branch where you are actually developing.

The way that the Harvest and ClearCase work is that you create packages of changes that essentially ought to relate to features, which may actually contain multiple check-ins of code. When the feature is deemed to be complete in development it is "promoted" to the next level (i.e. CI), built and tested. You might vary the next bit depending on your project methodology, but essentially you take periodic releases of the software. This is then usually done by promoting the tested backages from your CI environment and progressing them through the remaining branches until they become the production release. If bugs occur through the environment fixes can be applied at any of the other levels and added to the build.

One of the sticking points here is that you can only check in changes against a single feature at one time, and so this makes concurrent development more difficult - and this is especially difficult when it comes to a bug-fixing stage of a project iteration when lots of small changes are being made with high frequency. (I know we should only be promoting bug-free code, but get in the real world).

In Team Foundation Server source control it is possible to set up a similar branching strategy and use the merging features as a means of promotion. You can still make all of the associated levels but there is no direct hierarchy implicit in the system. This is enforced by usage and also that merges can only be made up and down the lines of the branches.

One of the major contrasts lies with TFS where changesets. These can be / are associated with one or more work items, but they have a much looser correlation, and the changeset is the atomic unit of change rather than the feature. Therefore if you want to promote a feature you have to merge in all of the changesets up to the last changeset for that feature. This will also usually mean merging in all of the changesets that relate to other features as well, meaning that code on "unfinished" features may get promoted as well.

This might appear to mean that TFS has got much less control (and there is some merit in that observation) but it also means that you have a more consistent behaviour when you merge. In the feature-based model of ClearCase it is possible that some dependent code, not changed in a feature being promoted but changed in another non-promoted feature, will change the overall behaviour of the solution. If you promote all changesets up to a given point then at least you know that your build will behave the same.

Release-Based Source Control Model

One of the implicit assumptions of the streaming approach is that you have a separate build in each stream and that the sourec code of the stream constitutes the "release" in many ways. An alternative model that is often used in TFS is to branch based on releases. Let's imagine you are shipping a software product Widget 2009. You also have to support Widget 2008, Widget 2007 and Widget 2006, all of which are based on the same source code but with enhancements and developments in the intervening period. You have to support all of these products and be able to issue service packs against them. You also need to make sure that if you bug fix Widget 2006 that same fix can be merged into the later releases as well.

In this scenario the hierarchical streaming model is not suitable, because each time you promote a new set of features to your production stream and start building a new "production" release of your software you are effectively ending the ability to build your previous versions.

What you might do in this model is to have a branch for each release of your software again with a build process for each branch, but each branch does not overwrite the others and has its own lifecycle. If you need to patch a previous release into "production" you don't have to overwrite the other production release.

When to use each approach

I have discussed a couple of different approaches to branching and maintaining releases, both of which are in use in various organisations and each has its merits and demerits. The question is - if you have to put in place a branching strategy which one qould you choose? Which one is most appropriate in different circumstances? What are the pros / cons of this approach?

What I would say with this is that if you are shipping software product where you need to be able to manage the source on many versions of the software at the same time then you will need the second approach, and have a branch for each release. This scenario involves many different users who use different releases from each other and therefore all need to be supported. Desktop applications definitely fall into this category, as do many other retail packaged software products such as components and server products.

The release-based approach does have its side-effects and these should not be ignored. The main side-effect is that if you have many releases of your software you end up with a large number of branches and build profiles and these become difficult to manage. As I said, an annual release of a software product isn't going to lead to unacceptable overhead in this model.

However, another very common scenario is where an organisation has a software product that needs to be refreshed on a periodic basis. When the upgrade happens all users are affected at the same time and cannot choose whether they participate in the product or not - they are sent the upgrade anyway. This is most commonly applied to software teams within an company producing bespoke software, but may also be applied to self-updating applications such as iTunes where upgrades are pushed out regularly and downloaded over the Internet and installed. It also applies to .com organisations where you obviously have one current production build of your software.

In these scenarios you tend to have a high number of releases, especially with agile projects, and once a release has made it to production you discard the previous releases as you will never be opening up and servicing the old code. In this scenario it is easier to manage your source code if you have a limited number of streams and promote changes up through them, irrespective of whether you are prmoting changesets in TFS or features in Harvest or ClearCase. Once you have got all of your branches building you may find you have a lower project overhead in maintaining your builds.

In conclusion.....

A modern source control repository must support effective branching and merging in order to handle development of new versions of software whilst supporting current versions. The manner in which you branch will depend on your release cycle and the type of software that you produce. Picking the correct branching strategy for your project will have a direct impact on how effectively you can support your software, so take time to think about it and get it right.

No comments: