Wednesday, March 11, 2009

The Joys of Legacy Software #2: Taking over a legacy codebase

"...the application  I am looking at is an intranet-based line of business system written in C# / ASP.Net.  The client has been having difficulty in developing the system because insufficient thought has been given to the design beforehand..."

The Problem

This is often the case when you take over systems;  there is a system live on the client's premises, and you are presented with a dump of the code.  You hope that when you build the code and restore a backup of the databases that everything will work.  But will it?  What sort of state is it in?  Are all the references OK? 

In this post I will be describing some of the steps that you need to take as a consultant when you take over responsibility for someone else's code.  It's not pretty, but getting some of the basics right will help you a lot later on.

Step 1:  Getting a grip on the code

1a:  Get the code in your own repository

There's really nothing else you can do in this situation.  You just have to take the code, get it under your own control, build it and then do some serious regression testing against the system on the client's site.  There's really nothing else that can be done.  If any issues are found then you need to highlight them on day one and if neccessary you have to give the client a cost for fixing them. 

Fortunately for me, everything here was OK.  I use TFS 2008 as my code repository, and the first thing to do is to get the initial cut of the software into source control.  The structure I use is as follows:

Customer
 - Application
 -  - Release Version

What I have done in this case is to put the code I have taken over under a v1.0 branch and built it from there.  This then becomes the "reference" version  of the system.  The 1.0 branch will stay in source control as-is and will not be modified in any way and will be used for reference. 

1b:  Create your test environments

This is something to do now.  NOW.  In what is sometimes called the inception phase of the project.  Or "iteration zero".  Whatever you want to call it.  Before you get any further, create the test environments.  I am going to have two environments, one that is a "reference" installation that mirrors v1.0 of the system and the other is for my upgraded system, v2.0.  

In my case here I have a simple system, in order to deploy each instance of the application I only need a web server and a SQL server.  I also need Active Directory so I have created an AD installation as well that will be shared across both of my installations, so 5 servers in all.  I have created a set of test users in AD and I am ready to go.

1c:  Create the reference deployment

My v1.0 branch has the same code base as the existing system (allegedly), but since I only need to deploy this system once I am not averse to a bt of a manual deployment.  The key thing here is to get the correct binaries onto the test system and get the database restored.  Tweak the config files to connect to the correct servers and give it a spin.  

The main issues I have had in getting the system working are as ollows:
  • The data access layer uses NHibernate, so the connection string is in the NHibernate config file which is deployed iThrn the bin directory.  This needs to be modified with the correct connection string.  [Horror moment #1:  The connection string in the config file that I have received form the client has the username and the password included in the connection string!  What is more, commented out in the same file are the connection strings for both the UAT and the production environments!  All unencrypted!  Ouch!]
  • As often happens after restoring database backups you have to removed the old users and add in the new service accounts.  Note that it's always best to add a service account as a user on your database and then set this service account as your app pool identity in IIS, or use some other form of Windows authentication.
  • The system sends out emails to users during the workflow (another big future topic here), and of course the SMTP settings are included in th web.config of the web site, and so these need to be tweaked as well.
  • The system needs to save files as part of the process, and these are located on a share.  The web config also contains the location of where these files are stored to (yet another post in the making here on file storage, document libraries and file storage services). 
These tweaks are crucial for your future deployments.  You must note down all of them as they will become the environmental-dependent parameters that your build and deployment process will need to be able to configure on a per-environment basis later on.  

1d:  Regression test

This does what it says on the tin.  When you have got your reference installation you need to regression test it against the expected behaviour.  At this point log all known issues or you'll be held accountable for them later!!!  

Step 2:  Start to sort out the mess

After you have got owenrship of the code and you have been able to establish that the code you have been shipped is actually working, you now need to get to grips with it and sort it out so you don't have a totally flaky foundation.

2a:  Create the working branch

At this point we have created the v1.0 branch.  What we do now is branch the code to create the new working v2.0 branch so that we can start making changes to the system.  This means that if you get latest on the v1.0 branch you will always have a reference of what is there before.  All I would do at this point is to use TFS to create a branch of the v1.0 code.

2b:  Upgrade to latest version of .Net

This is the ideal opportunity to keept he system current.  If you now check out your complete v2.0 branch you should be able to open the solution(s) in Visual Studio 2008 and let it run the upgrade.  You don't need ot keep any logs or backups ebcause you will always have your v1.0 branch as a backup.

During this process I had a bit of a nightmare upgrading my website.  The intranet site was one of those awful .Net 2.0 websites there the code-behind is deployed along with the web forms.  The code behind had no namespaces on it and as the code is designed to JIT compile into temporary assemblies you do not get all of the classes you want to.  Also there is further code in the App_Code folder.  This is an evil in its own right.  If you have this on your dev server even when you have compiled it all into an assembly IIS will keep trying to compile it and you sometimes get namespace / type clashes because of this when the app runs.

What I ended up doing (and I had the luxury of not having a website that had too many pages and controls in it, perhaps 20 web forms and 40 controls) is to create a new web application from scratch and then migrate in the web forms one at a time, by creating new forms of the same names and then copying in first the markup and then the code behind.  This is a really tedious process, but in this way you know that you have got a fully compiled, namespaced application.  I also tend to rename the App_Code folder into just _Code or soemthing like that so as not to confuse my web server.  Remember to set all of the C# files as compile and not as content if thay have come across from the Ap_Code folder.

Step 2b:  Tidy up the references

When you have a web site in VS2005 and you add a reference what effectively hapens is that the referenced assembly gets copied into the bin directory of the web site and so it is then avilable to be used.  This is no use for a web application project as we must reference all of the assemblies so that the compiler can work out the references.  

When creating a code structure, what I usually do is as follows.  I will start from the working branch (in this example it is v2.0).

 - v2.0
 -  - Build [More about this in a later post]
 -  - Referenced Assemblies
 -  -  - Manufacturer X
 -  -  - Manufacturer Y
 -  -  - Manufacturer Z
 -  - Solutions
 -  -  - Solution A
 -  -  - Solution B
 -  -  - Solution C

So, having got a bin directory full of strange assemblies, I then copy them out into the referenced assemblies folder and then delete them from the bin.  I add file references to my projects in my solution for the obvious assemblies and then I use a trial-and-error process of compiling my application until I have referenced all of the assemblies I need to.  You'd be surprised how many of the assemblies that are int he bin directory are not direct references of the web site but have ended up then because they are referenced by a dependent project.

OK.  After this we are good on the references front.  We know what assemblies we have got to deploy and they are all in source control.  We're starting to get to the point where we could actually get someone else to do a get latest and see if they can build the beast.  In fact that's not a bad idea.  Go and ask someone right away.  You've nthing to lose.

Step 2c:  Tidy up namespaces and assembly names

You'd be surprised (maybe) how often you take a solution, build it and then find look in the bin and find that the assemblies all have different naming standards.  Look at the namespaces too and these might all be different.  It's a pain in the butt, but you need to decide on your namespace structure, go through each of the projects and set the namespaves and assembly names.

Step 2d:  Tidy up versions and strong names

While you're in there don't let me forget that this is also a good time to set your assembly versions.  If you are working in a v2.0 branch you might want to make all of your new DLLs v2.0.0.0 as well.  

And this is a good tome to create a key file and sign all of your assemblies.  Even the one in the website.  This is sometimes a moment of truth for your referenceda ssemblies as well, because you can't sign an assembly that references an unsigned assembly.  At Solidsoft we have been working with BizTalk so long, which requires you assemblies to be in the Global Assembly Cache (GAC) that we sign all of our assemblies as a matter of routine.  More seriously though, there is a code security aspect here as well.  You sign assemblies so that they cannot be tampered with.  You don't want to be in the situation where one of your assemblies is recompiled by a hacker with some malicious code in it, and signing removes this risk at a stroke.  

When I went through the signing process I found that the UIProcess application block hadn't been signed when it was compiled, and the codebase I had only referenced the DLL so I tokk a bit of a risk downloading the source code, signing it and replacing the assembly.  There was an issue with the configuration so I had to modify the config schema, but other than that everything went fine and I was all sorted out.

Step 3:  Create the build process

This is the time to get this right.  A good build and deployment process can seem like it sucks up no end of project time but you get the payback later when you are trying to stabilise and ship your system.

I use TFS and TFSBuild as my build environments now, although I have used MSBuild and CruiseControl.Net in the past.  I have two build definitions as a minimum.  The first just runs a compile on all solutions and runs unit tests but does not deploy.  This is triggered by checkin and so is effectively my "CI" (continuous integration) build.  My other build definition is a build / deploy and will push out my build onto a test environment.  I use InstallShield to create the MSIs, xcopy them over onto the test server and then use PSExec to install via a command line.

Review

This has been quite a long post, and in real life this part of my project was a real slog, but at the end of it we're in quite good shape now.  We have got a repeatable process for delivering our application and this is the minimum level you need to be able to ensure quality.  Once you are here, with a build automation process and automated deployment, you can start to overlay automated testing as well as the traditional manual UI testing, but without getting some of the quality in place at this stage you'll never get the results later.

I hope that this has been useful, and that this post has either given you some ideas on organising your solutions or has made you think why you organise your source as you do.  Next time I'll be digging into the application and seeing what's in there.  I'll warn you - some of it isn't pretty!

No comments: