Debugging: Navigating the Maze

There is a simple and fundamental process for finding bugs:

What do I think the problem is? Check if that’s the problem.

Though there are other nuances to how this process unfolds and what we face when attempting to understand an issue in the system, each part of the process can be broken down and simplified into this statement. Let’s explore it in more detail.

This post is part of our Debugging Series 2021

Start with an assumption

As has been stated before, all bugs arise from false assumptions.

Assumptions of state: That a value at a given point in time is what we expect it to be
- Email address is "user@example.com"
- The amount being deposited in a transaction is $100
- The ENV variable for the API endpoint is "api.catmemes.com"
Assumptions of events: That the expected reaction to an event occurs
- System returns a confirmation when a successful login happens, and an error when a invalid login attempt occurs
- System turns on the heater element when the temperature gauge drops below a certain value
Assumptions of execution flow: Which code paths are actually executing?
- Comments in code that say // This should never happen!
- Not defining a logic case for a possible flow
Assumptions of code presence: That the code we think was written is actually there
- We think there is code to hide avatars in guest mode, but there isn’t
  - Didn’t execute a push
  - Overwrote a commit
  - Missed a TODO item
Assumptions of reality: The rabbit hole of fundamental truths to our current reality
- Time only moves forward, time zones are offset by a single hour, a road will have a name
- Other dangerous-time-consuming-infinite-darkness-dwelling dragons (but sometimes we need to slay dragons yo!)

Check it

The debugging flow chart.

We begin with our first guess at what the problem is, which assumption about the code or system might be false, and check it. The action of checking can take many different forms but the basic premise is we actively prove or disprove our suspicions so we can revise them and repeat the process. This is the flow we follow during the entire debugging process until the bug is either discovered or we take alternative action (more on that later). This process can be compared to the myth of Theseus and the Minotaur, in which Theseus used a long string of yarn to keep track of places he had already been in a maze while trying to find and kill the Minotaur that lived there.

The five minute problem

Initially, take the quickest and most promising route.

Within the first 5 - 10 minutes of debugging, don’t add extra process. Focus on your best guess as to where the bug is with the current information and quickly check those assumptions. Follow the flow above and remember to tie down your yarn when you change directions (i.e., make a quick note of what you tried).

The medium problem

Take stock of the pathways you’ve explored and formulate a plan for branching out.

After the first 5 - 10 minutes of following this process without successfully finding the root cause of the bug, you need to adapt your strategy. Slow down and start refining your guess-and-check process. This is the time to list more of your assumptions. Think of it as a game where you are creating a map of a creepy dark dungeon.

Focus on exposing your assumptions around the problem with the information you already have from your initial phase of checking.
Which have you already checked? Mark these as explored.
Categorize your assumptions into two categories based on how costly they are to check.
- Easy: These are assumptions that you can quickly check with little effort
- Hard: These are more complex assumptions, possibly with nested hidden assumptions, that require more time and effort to check

Once you’ve categorized your unchecked assumptions into “Easy” and “Hard” categories, resume your guess-and-check flow from before. Start with the “Easy” assumptions and scale up your efforts as you go.

The major problem

Play the long game or don’t play at all.

After a few hours you’ve ventured into “Mega-confusing-bug” territory. This is no mere first level dungeon, it is a verifiable maze and will require some serious adventuring skills to navigate successfully. One does not simply walk into it. At this point you have quite a lot of information about where and what the bug is “not”. This might be frustrating and confusing:

The scariness of bugs increases exponentially the more places you check and don't find them.

It’s time to slow down and course correct before rushing into a sticky situation.

A tractor tries to free the Ever Given ship from the Suez Canal.

It wasn’t until they ran bundle update that Komatsu realized this was a bad day to show up to work.

If you went too far too fast and find yourself aground in a channel blocking traffic, we’ve written about initial strategies for getting out. On top of that, it’s time to start considering alternative action to resolve the issue. Let’s do a cost analysis of your remaining options.

Fix the bug:

Setting up longer experiments: It’s possible you are experiencing a bug that you don’t have enough data for and you need to set up a longer-running experiment to check a more complex assumption.
Patching the bug without fixing the root issue: Will it be more cost effective to patch the issue initially? Is it possible to put in place a band-aid solution that will hold things together while you explore the deeper issue?

Don’t fix the bug:

Some bugs are both un-patchable and not worth fixing. Bugs that don’t cause any real issues to the system and are highly sparse in occurrence fall into this category. These types of bugs are annoying not because they cause actual problems but rather because they simply exist. There will always be messes; some are best left to exist in peace.

Tying it all together

With bugs, the process is simple: guess and check. Start with the easy options that are closest in proximity to the observed bug. When a solution is not immediately apparent, list your assumptions and find the next easiest set to check. Bring your yarn ball. When you have spent significant time on the bug, evaluate long-term options and execute a new strategy.

Finally, remember this:

You will always find the bug in the last place you check.

Happy dungeon crawling!

Debugging Series

This post is part of our ongoing Debugging Series 2021 and couldn’t have been accomplished without the wonderful insights from interviews with the following people:

Keep tuning in every week for more great debugging tips.

Debugging: Navigating the Maze

Start with an assumption

Check it

The five minute problem

The medium problem

The major problem

Tying it all together

Debugging Series

About thoughtbot

Let’s make your product and team a success

Start with an assumption

Check it

The five minute problem

The medium problem

The major problem

Tying it all together

Debugging Series

Sign up to receive a weekly recap from thoughtbot

About thoughtbot

Let’s make your product and team a success