Fixing bugs quickly, killing zombie bugs

Zombie bugs

One of the most embarrassing things for a professional developer happens when a closed issue for a bug is reported again, and it looks that the bug is still alive.

Example

The example described in this post is based on a real bug in one of the projects I was supporting.

What happened for our client was that wrong the customer being assigned to the order in the software. Precisely, a customer with a similar last name. It looked like a human mistake but unfortunately, it was easy to recreate.

The database had many old customers and some new customers from an import script. With that large numbers the bug should be reported months ago, why now?

To make it more complicated, we were only maintaining integration and providing support for the client’s application. The app had a closed source.

Fixing bugs quickly and deadly

To be quick, it’s good to have some automation, even like following some proven points or procedures. Bugs are often different from each other so a general list might be not perfect – but it will be better than not even knowing where to start. Fixing bugs quickly

To fix a bug, you have to really fix it. Not just hide the symptoms. Fixing bugs quickly is not the same as hiding bugs, even if the latter is quicker.

1. Compare working and not working data

Where lies the difference in the data producing the wrong output, comparing to similar but 100% working data?

In the customer-order example, customers were compared in the software the client was using. The found difference was the lack of country shortcode for the client. The country field was updated for that customer in the software, and it worked.

1. What’s changed recently?

Analyse what changed in the data and connect it with recent changes. Here the answer was simple – import script was the only change.

The source of the problem was resolved as well (missing country field in the import script).

1.2. Compare only raw data

Do not modify the data, and compare it in its original form (database, XML etc.).

A mistake made at this point was that customers were firstly compared using the user’s interface instead of rows in the table. This mistake, along with not following points given below, caused the bug to be happening again for another customer a few weeks later.

When the bug was reported again, the raw data were compared before making any changes, and original customers had their names in uppercase, while many of those imported using the script, not.

Software showing details of a client always did it in uppercase and saved the data in uppercase as well. Even if raw data check happened, it was after the first fix (after update), and only two rows were compared (instead of a larger set of customers).

The improper compare & fix process removed only the symptom. An update on the country field changed all fields to uppercase, which fixed the problem for that particular client.

2. Control when the bug comes out

This is simple. You have to be able to make a bug happening manually. Able to reproduce on demand.

In the first fix, the country info should have been removed, the bug happened again, then country info added, and the bug should disappear.

The procedure was applied for the second fix, changing the name to lowercase actually caused the bug, while the name in uppercase worked fine. The name was changed like that few times to be actually sure (probably the software developed by a third-party was using some uppercase when searching for the client by name, instead of id).

3. Fix broken data

When the bug is finally resolved, all the damaged data should be repaired if possible.

Although data for countries were fixed, as well as names, broken order (connected to the wrong customer) was not fixed.

4. Do the status check

Using bug tracking software should help to see if something was fixed or not, but always communicate to be sure.

Here, it was assumed by the second developer that the bug was fixed (he checked if it is not happening again, without asking if it was really fixed), so the fix was communicated to the client.
Not surprisingly, the bug was quickly reported for the third time. Fortunately, it was just that one broken order to be repaired.

5. Remove the broken root

Although in the first fix source of the problem was removed (missing country info), the fix was not needed (while handy, the data were redundant).

Be precise and when one tire is damaged, don’t replace all of them.

In some cases, fixing the root at the wrong time may cause more problems.
Imagine fixing orders, as well as the order’s module, while new orders are added every minute. And a patch is going to broke existing orders, because the new module will only accept correct orders of the new type.

6. Prevention test for zombies

A nice (professional) thing to do would be checking for the bug in the near future. Let’s say when a bug has a “Solved” status for a week, e-mail is sent to the development team to check it. If everything is fine then, the bug can be moved to “Closed”.

Summary – fixing bugs quickly

All the steps above can take more time than just a cowboy fix “I fixed it in line #42, solved!”.

However, longer way of fixing a bug is actually quicker. Every time a bug happens all the procedures are happening again, and clients are not always so nice to report that the current bug was reported before (or the bug is reported by someone else, unaware of previous problems).
That’s why more effort on killing the bug for death is a saved effort, effort in dealing with an undead nightmare which may bite us weeks later.

Other resources on fixing bugs:

How to fix bugs step by step – by Chris Wenham

Fixing a Bug is Like Catching a Fish – by Jim Bird

A Programmer’s Guide To Effective Debugging – by John Sonmez