Debugging an issue in code can be a time consuming and difficult process. This can be made even more true when debugging a multi-threaded application that has a bug caused by a race condition, or being thrown in to an unfamiliar code base. However, there is a possible methodology that can help traverse this madness, and that’s the 5 Whys technique.
The 5 Whys technique was used by Toyota as a way of diagnosing issues on their vehicles. Below you can find the same example that was taken from the Wikipedia link on how the method would play out amongst the technicians at Toyota when a car comes in with an issue.
Issue: Car has entered the shop and does not want to start.
- Why is the car not starting?
- Reason: The battery is dead.
- Why is the battery dead?
- Reason: The alternator was no longer charging the battery
- Why was the alternator not functioning properly?
- Reason: The belt providing power to the alternator was broken
- Why was the belt broken
- Reason: The belt had been worn to the point of needing to be replaced
- Why was the belt not replaced
- Reason: Car was not properly maintained
The above scenario helps the team focus on trying to diagnose and understand the problem. It gives a clear question of what needs to be answered before we can move along and drill down into the issue further. Let’s now apply the above example into a bug that I am currently working on. I’ll also break down the steps a bit more as to how I came up with the answer to the question
Issue: Application hangs when zooming out.
Why is the application hanging
The first why can be the most difficult to answer as there is just a generic broad question that requires a bit of information. Luckily the hang was caused by a unit test that timed out after an hour of running. Because the unit test was nicely laid out into separate steps, it was easy to start adding some logging as to which part of the test was not being executed. If this was a statement coming from a user, I would then have to ask the user about what they were doing to get the hang to occur and try to get as much information from them as possible to try and recreate the issue locally.
Once the section with the last executed code was found, we can then start going and trying to identify what was going on. In my case the application was found stuck in the gesture handler. With that we had an answer to the first why.
Why was the gesture handler stuck
Now we can move onto the second question. The gesture handler was being used to set the set a scaling factor to be applied to the world and the objects within it so that more objects can be visible within the camera. It was also used to try and center the camera relative to the center of the 2 fingers performing the gesture. However, the gesture handler was returning an error while trying to re-project the center coordinate into screen space if the gesture was being done too fast.
Why was the coordinate failing when projecting to screen space
Looks like we’re now getting a better idea of what the issue can be. Some projection is failing to get converted into screen space. With that I started digging down into the camera and seeing if for some reason the camera viewport was not updated correctly as this was a multi-threaded application and indeed that was the case
Why was the camera not being updated
We’re almost at the home stretch at question 4. Here I know that the camera and rendering logic is happening on a separate thread. After investigating and finding the correct thread I can see that we are indeed inside a method that is updating the camera. After further investigation we see that we have reached a deadlock between an object’s resources in 2 threads.
Why is an object attempting to be accessed across multiple threads
Without getting too much into the details of the fifth why and the simple resolution of refactoring the logic a bit so that the locks can be freed earlier, we can see how this technique can be used to help focus on an issue. While this might not speed up the debugging process and there may need to be some questions to be answered by other members on your team, as they may be a bit more knowledgeable in certain parts of the code, it will help you in trying to understand the problem better and will make you better at debugging issues in the long run.