The ‘5 Whys’ technique to debugging an issue

Debugging an issue in code can be a time consuming and difficult process. This can be made even more true when debugging a multi-threaded application that has a bug caused by a race condition, or being thrown in to an unfamiliar code base. However, there is a possible methodology that can help traverse this madness, and that’s the 5 Whys technique.

The 5 Whys technique was used by Toyota as a way of diagnosing issues on their vehicles. Below you can find the same example that was taken from the Wikipedia link on how the method would play out amongst the technicians at Toyota when a car comes in with an issue.

Issue: Car has entered the shop and does not want to start.

  • Why is the car not starting?
    • Reason: The battery is dead.
  • Why is the battery dead?
    • Reason: The alternator was no longer charging the battery
  • Why was the alternator not functioning properly?
    • Reason: The belt providing power to the alternator was broken
  • Why was the belt broken
    • Reason: The belt had been worn to the point of needing to be replaced
  • Why was the belt not replaced
    • Reason: Car was not properly maintained

The above scenario helps the team focus on trying to diagnose and understand the problem. It gives a clear question of what needs to be answered before we can move along and drill down into the issue further. Let’s now apply the above example into a bug that I am currently working on. I’ll also break down the steps a bit more as to how I came up with the answer to the question

Issue: Application hangs when zooming out.

Why is the application hanging

The first why can be the most difficult to answer as there is just a generic broad question that requires a bit of information. Luckily the hang was caused by a unit test that timed out after an hour of running. Because the unit test was nicely laid out into separate steps, it was easy to start adding some logging as to which part of the test was not being executed. If this was a statement coming from a user, I would then have to ask the user about what they were doing to get the hang to occur and try to get as much information from them as possible to try and recreate the issue locally.

Once the section with the last executed code was found, we can then start going and trying to identify what was going on. In my case the application was found stuck in the gesture handler. With that we had an answer to the first why.

Why was the gesture handler stuck

Now we can move onto the second question. The gesture handler was being used to set the set a scaling factor to be applied to the world and the objects within it so that more objects can be visible within the camera. It was also used to try and center the camera relative to the center of the 2 fingers performing the gesture. However, the gesture handler was returning an error while trying to re-project the center coordinate into screen space if the gesture was being done too fast.

Why was the coordinate failing when projecting to screen space

Looks like we’re now getting a better idea of what the issue can be. Some projection is failing to get converted into screen space. With that I started digging down into the camera and seeing if for some reason the camera viewport was not updated correctly as this was a multi-threaded application and indeed that was the case

Why was the camera not being updated

We’re almost at the home stretch at question 4. Here I know that the camera and rendering logic is happening on a separate thread. After investigating and finding the correct thread I can see that we are indeed inside a method that is updating the camera. After further investigation we see that we have reached a deadlock between an object’s resources in 2 threads.

Why is an object attempting to be accessed across multiple threads

Without getting too much into the details of the fifth why and the simple resolution of refactoring the logic a bit so that the locks can be freed earlier, we can see how this technique can be used to help focus on an issue. While this might not speed up the debugging process and there may need to be some questions to be answered by other members on your team, as they may be a bit more knowledgeable in certain parts of the code, it will help you in trying to understand the problem better and will make you better at debugging issues in the long run.

Taming the snake: Learning Python and putting it to use

For the duration of my programming life I’ve been working with C, C++, and C#. If you ask anyone in the programming space you’ll see a lot of articles about how they know a multi-verse of languages and how they won’t even consider anyone a true programmer unless they have 25 different languages under their belt. So I figure that I venture down that trail and start with Python.

Going on an adventure with Python

I started taking some online classes from codecademy to see what this is all about. What I’ve noticed off the bat is that I’m still in my C-style ways of semi-colons and tab styles that would cause scripts to break. I have an old project that was setup in C#, but seeing as I have moved on to a Linux machine for work I figured I try working in a language that is a bit more cross-platform friendly right out of the box (maybe I’ll revisit having a C# application on Linux when Microsoft releases .NET Core Final Alpha Beta Release Candidate). I’m going to start porting over various scripts to my git hub as the bits and pieces come through. Hopefully the next update will arrive before the end of the 3rd quarter.

Wise words from an old mentor

I had many mentors as a Padawan in the world of programming. These mentors varied from helping me learn the first steps as a programmer from the famous “Hello World” program in C, C++, and C#. To the more advanced techniques and theories of data structures and various design patterns. I also got thrown into a world of 2D and 3D graphics programming with OpenGL and DirectX. But the mentor that stood out the most out of them all taught me about Linux, embedded architecture, and the famous x86 assembler. He was very knowledgeable and gave me the following wise words to always remember as a programmer. While he added his own spin on these rules, they always stuck by me while I was designing new functionality or going back and improving my existing code. Below are his words that were modified from Code Complete and from NASA’s 10 coding commandments that stuck with me:

  1. Comment your code with valuable comment not useless ones
  2. Don’t forget about error checking since anything can fail and it will
  3. Use less code (DRY don’t repeat yourself)
  4. Write code that can be easily modified but don’t make it unreadable. Code can always be thrown away.
  5. Write code that can be easily tested. If you did not test it then consider it broken.
  6. Fix the problem, not the symptoms. The symptoms are the obvious artifacts of the problem, you need to find the problem.

NASA’s 10 coding commandments

  1. Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion. (well all of these can be used in moderation if need be but there must be a justification)
  2. All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded. If the loop-bound cannot be proven statically, the rule is considered violated. (Good idea but not always possible but you must justify the exception)
  3. Do not use dynamic memory allocation after initialization. (of course this assumes that during initialization you know exactly what memory you need, which might be difficult to determine)
  4. No function should be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function. (Good idea but limits be arguable)
  5. The assertion density of the code should average to a minimum of two assertions per function. Assertions are used to check for anomalous conditions that should never happen in real-life executions. Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition to the caller of the function that executes the failing assertion. Any assertion for which a static checking tool can prove that it can never fail or never hold violates this rule (I.e., it is not possible to satisfy the rule by adding unhelpful “assert(true)” statements).
  6. Data objects must be declared at the smallest possible level of scope.
  7. The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked inside each function.
  8. The use of the pre-processor must be limited to the inclusion of header files and simple macro definitions. Token pasting, variable argument lists (ellipses), and recursive macro calls are not allowed. All macros must expand into complete syntactic units. The use of conditional compilation directives is often also dubious, but cannot always be avoided. This means that there should rarely be justification for more than one or two conditional compilation directives even in large software development efforts, beyond the standard boilerplate that avoids multiple inclusion of the same header file. Each such use should be flagged by a tool-based checker and justified in the code. (could be a debate here but since macro are not well checked this is a good idea)
  9. The use of pointers should be restricted. Specifically, no more than one level of dereferencing is allowed. Pointer dereference operations may not be hidden in macro definitions or inside typedef declarations. Function pointers are not permitted. (No function pointers? Seems like the skill level expected here is suspect, but it does prevent static analysis tools from working well)
  10. All code must be compiled, from the first day of development, with all compiler warnings enabled at the compiler’s most pedantic setting. All code must compile with these setting without any warnings. All code must be checked daily with at least one, but preferably more than one, state-of-the-art static source code analyzer and should pass the analyses with zero warnings. (in some ways I agree, but suppressing the errors and warning can cover-up other problems)

Do you agree with these rules or are they outdated? Do you have other rules that you abide by that help make you a better programmer? Leave your comments below.

 

The above statements came from a post from my mentor Gary Miller. May he rest in peace.