UP | HOME

Playing detective

I was minding my own business as usual, testing out a GQL server, when I encountered an error. A "helpful" log entry in my STDOUT went something like this:

{
  "level": "error",
  "panic": "reflect: call of reflect.Value.IsNil on string Value",
  "time": 1674143536,
  "message": "Panic at the Disco!"
}

Quite helpful, wouldn't you say? The error message is explained by this section of the docs:

IsNil reports whether its argument v is nil. The argument must be a chan, func, interface, map, pointer, or slice value; if it is not, IsNil panics. Note that IsNil is not always equivalent to a regular comparison with nil in Go. For example, if v was created by calling ValueOf with an uninitialized interface variable i, i==nil will be true but v.IsNil will panic as v will be the zero Value.

Okay so far, only problem is - where exactly is this happening? Not anywhere in code I've written, a quick grep confirmed that. So of the 3 trillion dependencies and their sub-dependencies, which one is this error propagating from?

If only there was a way to know. Some sort of…stack trace? Wouldn't that be nice, if such a thing existed?

The problem is: somewhere in the call stack, something is recover()ing, and then simply formatting the value given by the panic. No effort is made to retrieve the stack of the panic, so that valuable information is lost.

The error message was certainly helpful - at least we're not entirely clueless about what's happening. But it would be nice to know where it's happening.

Unfortunately in this case we have to play detective. Of the libraries I'm using, gqlgen and bun were the primary suspects. Initially, I thought I could find where the recover is being called and try and get the stack trace myself, but that looked like it'd be more work.

Out of intuition, I started with bun - rightly so it turned out. I cloned the repo, checked out the release commit to match the version I was using, and updated my project's go.mod with this line:

replace github.com/uptrace/bun v1.1.9 => /home/automaton/golang/bun

Now in my local copy of bun, it was time to figure out where exactly IsNil was being called and causing a panic. I grepped the entire codebase for instances of "IsNil", then filtered out instances where it would be impossible to cause the panic, such as if v.Kind() == reflect.Ptr && v.IsNil().

I print-debugged the remaining suspects, adding a line like this just before the suspected line:

fmt.Printf("<<<<<<")
dest.IsNil()
fmt.Printf(">>>>>>\n")

Successful calls would look like <<<<<<>>>>>>, and obviously for the unsuccessful ones we'd get a panic after printing <<<<<<.

It worked, and I caught my the perpetrator. Case closed.

But none of this detective work would have been necessary, if I had some sort of stacktrace. If the error turned out to be coming from gqlgen instead, I'd have had to do the same thing over again with gqlgen, etc.

Lesson learned? Please contextualize your error messages.

Date: 2023-01-20

Author: Brian Kamotho