Saturday, March 25, 2006

Finding/counting instances of (leaked) C++ classes in crash dumps using WinDbg.

When dealing with memory leaks you have an armoury of stuff at your disposal: debugging versions of memory allocation routines, leak tracking tools from Microsoft (see DebugDiag and UMDH) and others.

An additional technique that I've used with some success (when for various reasons the above techniques are not immediately available) on full crash dumps is to go hunting for instances of C++ classes that I suspect may be being leaked.

Note that this technique only works for classes that contain virtual functions, and you'll need your symbols to be lined up properly.

Consider how a C++ class instance with (virtual functions) is typically laid out in memory: it starts with a pointer to a virtual function table, followed by the member data of the class. The values stored in the member data for different instances of the same class will vary, but the pointer to the virtual function table will be invariant.

That's nice: it means that if we know the whereabouts in memory of the virtual function table for a particular class, we can go hunting in memory for instances of that class. We can find out whereabouts in memory the virtual function table is by asking WinDbg. The X command is what we need here. Let's say we want to find the address of the virtual function table for class CMyClass in module mylibrary.dll. We can ask WinDbg to tell us by executing the command

x mylibrary!CMyClass::*vftable*

The * characters above are wildcards - I find it faster to type * than the more correct ` (backtick) character and it's also useful to know if there's more than one virtual function table associated with the class.

WindDbg will list the addresses of all the virtual function tables in the CMyClass class - there may be more than one if you are dealing with multiple inheritance and/or nested classes. For now, let's assume that there's only one. You should see something like this:

0123abc4 mylibrary!CMyClass::`vftable' =

Thus we know that we need to look for the address 0123abc4. We also need to know where to look in memory. Microsoft's DebugDiag tool is handy here: running it's memory analysis script over your dump will let you know which heaps belong to which modules, where their segments are located in memory and other interesting things.
The WinDbg !heap metacommand is also useful for listing the heap segments, but it won't tell you which heap belongs to which module.

Armed with this heap segment addresses and a virtual function table address, you can go hunting for instances of your class in the appropriate segment(s) in memory.
The WinDbg s (for search) command is what we need here - and we also need to have a WinDbg log file open before we start in case there's a lot to be found. Say we have discovered that mylibrary has a nice big heap segment from 0d000000 to 0d400000, we can search it for CMyClass instances with the command

s -d
0d000000 0d400000 0123abc4


Of course, it helps if you have a reasonable idea of the number of CMyClass instances to expect at this stage of the game, so you can tell if you find too many. Note also that if you do have a leak, its possible (likely, even) that the leaked instances will be splattered across multiple heap segments, so you'll want to search in all the heap segments associated with your module.

I often use a perl script to convert the output of the !heap command into a little batch file of commands to feed into WinDbg via the $$< command.

One other useful thing: often you may not know which class (or classes) are being leaked. You might get lucky with the information that DebugDiag provides about sizes of memory allocations. DebugDiag will tell you the most frequently occurring allocation sizes, and the sizes of allocations associated with the greatest memory consumption. You can use the C++ sizeof(...) expression in WinDbg with the C++ expression evaluator thus:

?? sizeof(mylibrary!CMyClass)

to see if the size of instances of CMyClass agrees with what DebugDiag tells you about the most frequently allocated size. A better thing to do is to ask WinDbg to tell you the names of ALL the classes in your module, then run sizeof on each of them.

x mymodule::*vftable*

is your friend here if you have more classes to deal with than you can comfortably enumerate by hand. Note that won't be all the C++ classes, just those with virtual functions). I use a perl script to produce a list of class name/size of instance pairs sorted by size to give a good list of candidate class types to start hunting for.

As I said originally, I've had some success with this "hunt the vftable" technique. Your mileage may vary. Happy hunting!

Friday, March 24, 2006

First go...recovering a call stack in WinDbg

I thought I'd write a bit about the fun I've had with debugging. This is mostly for my benefit, but if it helps you too, then whoopedoo.

If you're not already familiar with WinDbg, two really great places to go for information are John Robbins excellent book "Debugging applications for Microsoft .Net and Microsoft Windows" and Tess Fernandez's blog "If broken it is, fix it you should".

Mostly I've been wrasslin' with Microsoft's WinDbg, trawling through crash dumps with my "Wha' 'appen?" hat on. These sessions tend to go in fits and starts as there are occassional (blessed) hiatuses when the system doesn't need my undivided attention. Following a gap between debugging sessions the anaesthetics of time and family life conspire to drain my memory and I usually spend the first few minutes of analyzing a new dump in an unproductive haze. I've done this often enough to think "I gotta write some of this stuff down somewhere I can find it again."

I tend to deal with crash dumps from COM+ packages that have gone south, and one of the things I forget most frequently is "How do I recover the stack trace when COM+ has caught an exception?"

First things first: get your symbols straight. This used to be fiddly (for me, being a bear of very little brain), but since Microsoft made their symbols available on the net and made the symbol server available to us reg'lar folks it's pretty straightforward. Read the docs, point at Microsoft's symbols, set your own symbol server up and move along. You only gotta do this once, and if you're really lucky you can get one of your minions (sorry Neil:) colleagues to do it for you - it's a public service, and we all feel the benefit.

An early port of call is Microsoft's excellent DebugDiag tool available here (or google "IIS Diagnostics Toolkit" ). Running DebugDiag's crash and memory analysis scripts can show a nice recovered call stack following an exception. This is fine, and with a following wind might even be all you need. However, when you are not blessed weatherwise it's time to break out WinDbg and have a good ol' root 'round.

When you've got your dump loaded into WinDbg and the faulting thread shows that you've ended up with an exception, the call stack doesn't show the calls leading up to the instruction that threw the exception. What you'd like to see is the nice "recovered" call stack that DebugDiag shows you, and poke around on that stack to get some more info about how things got to be so broken. To reach this happy (or at least happier) state, you need to use the .cxr WinDbg command.

I can usually remember this much. But then I come unglued. How do I figure out what argument to feed to the .cxr command to get my nice happy call stack back? I know I've seen this written down somewhere, but what with age, attention span of a butterfly, etc. I can't remember where. So I keep having to find out again. What I remember now (prior to writing this, where I promise I'll look in future) is that I can work it out by looking at DebugDiag's crash dump analysis scripts. This is still pretty tedious (I remember I start by grepping for "recovered call stack" or some such if you need a clue).

When you've had an exception in a COM+ package, one way of finding the address to feed to the .cxr command (which is divinable from DebugDiag's crash dump analysis script) is to look at the call stack you currently have with the trusty kb command. If you can spot a call to comsvcs!ComSvcsExceptionFilter, you're in good shape. The first agument to this call is a pointer to an exception structure. Dump the first few bytes of this structure with the dd command thus:

dd xxxxxxxx

where xxxxxxxx is the value of the first argument to comsvcs!ComSvcsExceptionFilter. The second DWORD dumped by the dd command above is what you need to provide as the argument to the .cxr command to get the original exception-throwing call stack back with another kb command.

I think you can play pretty much the same game with kernel32!UnhandledExceptionFilter and NTDLL!rtlDispatchException, but I've yet to try it in anger.

Hope this helps (either you or me).