Thursday, February 21, 2008

Underestimating the importance of troubleshooting

I subscribe to a variety of mailing lists and forums. Not only to find answers to my own questions, but because it is a great place to exchange information and ideas. On a good list, you get exposure to a wide array of technical people, with varying backgrounds and experiences. Some discussions can be quite interesting, and I often learn new tricks or approaches just by reading them.

But one thing I have noticed in some forum questions, by both beginner and intermediate programmers, is a lack of solid troubleshooting skills. I find this very puzzling. Like it or not, if you program, you are going to have to troubleshoot code problems eventually. Even a single line of code has the potential for bugs. So while a firm grasp of the language, well written code and testing will minimize the potential for errors, they do not eliminate it. How quickly you are able to diagnose problems usually depends on your ability to troubleshoot. So I think it is a very important and necessary skill to develop.

Now I am not an expert by any means, but I do consider troubleshooting one of my strengths. Occasionally, I get "tripped up" by stupid coding errors just like everyone else. However, I can usually spot the issue myself just by taking the time to think the problem through and doing a little debugging. Sometimes the solution comes quickly, other times not. But it is a necessary process, and one that I learn from. Though in a few cases the most important thing I learned is what not to do ... ever again ;) While it did nothing to improve my headache at the time, I think it ultimately improved both my coding and problem solving skills.

At times we all overlook the obvious or approach something from the wrong angle. But when I see questions in which the asker gives no evidence of trying to work the problem through, I have to shake my head. It is almost as if the asker saw an error message, or strange code behavior, got "freaked out" by it, and temporarily lost their common sense ;)

Granted, error messages are often generic or cryptic. Yet sometimes they actually do tell you the exact cause of the problem. That is if we bother to read them. But in situations where error messages are ambiguous, or there is no error at all, applying a bit of logic and basic common sense can go a long way towards solving the problem.

There is no magical solution or one-size-fits-all approach, but there are a few basics steps I have found helpful when debugging code. Some of them may seem insultingly obvious, but I think they are worth re-stating ;) and hopefully it might help someone.

Assume it is a programming error, not a bug
While there is always a small chance a problem is caused by an obscure software bug, it usually is not. So I always start with the assumption the problem is a programming error, until I have run enough tests to suggest or prove otherwise.

Read error messages
Sometimes error messages tell you everything you need to know to fix a problem. So read them. Thoroughly.

Google it
If you receive an obscure error, or one you do not understand, copy and paste it into google, minus any application specific information. Chances are you are not the first one to receive that error. The explanation and fix for many errors is often just a google away.

Identify the location
My first step is usually to identify the location of the problematic code. It may be as simple as reviewing an error message, or it may not. If there is no error, I pick a last known good state in the process and work from there. In other words, I pick a point in the code that works correctly. Then work forward, a few lines at a time, until I can identify where the code fails. Each time I move forward in the code, I re-check the state of my variables/objects. If the results are good, I repeat the process until I hit an area where the test fails. If it makes more sense to start from a first known bad state, then I simply reverse the process. Sometimes identifying the location also identifies the problem itself. If it does not, I continue debugging.

Isolate the conditions
Determine if a problem occurs all the time, or only under certain conditions. The answer often gives valuable clues about the cause of the problem and/or how to fix it.

Test the obvious first
Most of us like to believe we are above making simple mistakes, but we are not. So forget about ego and do not overlook the obvious ;) The simplest or most likely cause often turns out to be the cause. Since those conditions are often the quickest and easiest to test, check them first.

Do not make assumptions about the code
When debugging code, if you expect an if/else conditional to evaluate to true, verify that it actually does. Logic errors are easy to miss because they do not always cause an exception. So as you step through the code, always verify the expected results match the actual results.

My conclusion: While developing effective troubleshooting skills takes time, mostly it is just common sense.


  © Blogger templates The Professional Template by 2008

Header image adapted from atomicjeep