Deployment Recommendations

Today is a day of reflection, thinking of everything that happened the past couple of months lots of overtime due to a third party that carelessly released something that broke a lot of things, a mayor deployment was made, and 100’s of errors suddenly appeared.

That makes me think what happened to testing? What about standard practices on mayor deployments?

I had to spend a lot of time further enhancing our product prevent or fool proof it so that if they do something similar it won’t affect us, again…

Week after week of telling them about the errors, and explaining what we needed all went back to normal.

All starts with development, the first step for a successful deployment is testing, testing, testing.

You need to make sure the code you are delivering is rock solid, no uncaught exceptions, correct validations, etc. also you need to make sure it can be deployed.

And before I forget did I mention you have to test?

One comment heard was “but there are a million rules in the system”, well if you are going to do a mayor version upgrade that will affect all your customer that have millions of rules, well, you have a million tests to write.

Now, hold on a second a million tests?

By the time we will be live the application will already be obsolete …

All starts with prevention, Don’t you have your unit testing in place to automate the testing? (really?)

So let’s go to the second recommendation.

Let’s assume (as it happens very often) you don’t have your testing framework in place, let’s do the deployment, with one condition no down time or minimize the downtime.

What?, can’t do that? Don’t know how?

When a release is to be made, and there are too many rules or validations that you don’t have time to verify, a rollback strategy is required.(a rollback procedure is ALWAYS required)

The deployment process is simple: backup/deploy/test, all work?

Then good, It Doesn’t work? take 20 minutes to figure out why, still failing? rollback.

If the deployment is of a larger scale your planning has to be done with more care

Best approach? do a side by side deployment and start to roll out your customers one by one or in controlled groups so if one has an error we place him back to where he was (he was happy right? I know when things work I’m happy) then continue working on his issues until fixed then rinse and repeat for the next customer.

Its an slower approach but is the best one (little downtime, no headaches)

Another approach is to go cold turkey and place everyone at the same time to production, but this is risky and you risk your reputation and your customer’s Rep. which is by far the worst thing you can do, most likely after that they will want to get rid of you (and they will eventually)

(*rant start*) that is what happened to us,  we “the customer ” had to do the provider QA /QC (grrrr) we had no option for rollback “no we can’t do that” .. well that sucks big time and you know it homenet .. (* rant end*)

To sum up to do a deployment :

  1. Test
  2. integrate your code and test again
  3. backup
  4. deploy and test, if errors grant some minutes(not days) to fix
  5. if can’t fix, rollback
  6. if can’t rollback quit and go work on MacDonald’s (a poorly cooked hamburger can go to the trash and only 1 customer will be affected)

If for any reason you cannot do a partial upgrade or a roll forward approach something is not right on your design.

Always separate the layers on your application.

image

I will expand more on this next time. (UX stands for User Experience)