Crash Course in Crash Grouping

Bobby
January 10th, 2022
Bobby Galli

Supporting large applications with enormous crash volumes can be a real pain in the hindquarters. It is extraordinarily difficult for organizations to optimally dispatch engineering resources without excellent data and proper tooling. At BugSplat, we recently upgraded the tooling we provide to developers so that they can group related crashes and better target their support efforts, deliver more stable applications, and deliver more value to their customers.

The Problem at Scale

Large applications generate extraordinary amounts of crash reports and can be difficult to manage at scale. Complex applications with large codebases generate tons of crashes that often look unrelated at first, but often share a similar root cause. Conversely, crashes that happen in 3rd party libraries or system functions might appear to be grouped together but are actually unrelated. For applications that generate large crash volumes, it may not be feasible to look at each crash individually. It is important to group related crashes so that management can get an accurate picture of which issues are most critical to fix and plan accordingly.

The Solution

Often, crashes can be logically grouped by the function at the top of the call stack when a crash occurs. However, there are several reasons why it might make sense to group at a level that is not the top of the call stack. Fortunately, BugSplat has developed simple yet powerful tooling that allows developers to determine how to group their crashes.

At BugSplat, groups of crashes can be found on the Summary page. By Default, BugSplat groups crashes using the topmost level of a call stack. A set of crashes can be regrouped when the default grouping does not make sense. When a crash happens in common code, such as a 3rd party library or system function, it is often the case that stack frames in application code are more useful in determining where the crash originated.

Crash Grouping Example

In our sample application myExampleCrasher, Widget, Gizmo, and Doohickey are all editable objects and the edits to each of these objects can be saved to the file system. These objects are loaded by an Editor which provides the Widget, Gizmo, and Doohickey objects with a file path where they should be saved when the user has completed their edits.

In this example Widget, Gizmo and Doohickey inherit from an abstract class (we’ll call this class “Thingy”) that requires them to provide overrides for the <void Save(LPCWSTR fileName)> function. They each do things slightly differently before saving their edits but ultimately end up calling a common function <void FileSystemUtils::Save(LPCWSTR fileName, char buffer[])> The common function ultimately throws an exception triggering a program crash.

When crashes from myExampleCrasher are processed by BugSplat, the top of the call stack, or the primary stack key, shows up as <KernelBase!RaiseException+0x69>. In this case, the primary stack key is not useful as it doesn’t tell us much about why the crash actually occurred.

Diving In

BugSplat’s Crashes page shows a list of all the crashes for our myExampleCrasher sample application. Notice that we have 3 sample crashes all with the stack key <KernelBase!RaiseException+0x69>9 which isn’t particularly helpful.

Since these 3 crashes share the same stack key they are grouped together on the Summary page.

Since these crashes all share the same stack key, they are also grouped together on the Summary page.

Clicking the Call Stack Explorer link provides an in-depth look at each of the code paths that caused a program crash. The number next to each node represents how many crashes executed that line of code.

The call stack explorer helps you prioritize code paths so that you can address the most important crashes first. In this example, the <Save> function in the Doohickey, Gizmo, and Widget classes are each responsible for 2, 2, and 3 crashes respectively.

Since more of the crashes contain calls to <Widget::Save> before crashing it makes sense to investigate this specific call stack first. If, for instance, 1000 crashes had passed through the <Doohickey::Save> function it would make more sense to look at crashes coming from the <Doohickey> class.

To group crashes, click the link at any level of the call stack. For this example, we’re going to click the link containing <Widget::Save> to navigate to the Group Crashes page.

In this case, we want to investigate the 3 crashes that contain Widget::Save. To create a group for all crashes that match the top 5 stack frames exactly we’ll use the Group Similar button. The Group Similar Crashes button will split all crashes that contain each of the 5 stack frames displayed by the Group Crashes page into a new group that can be seen on the Summary page with the 5th frame now being treated as the Stack Key.

The Summary view makes it much easier to see which groups are causing most of the crashes. We’ve decided to attack the <Widget::Save> function first so let’s dive in. To get started we’ll click the link under the stack key column containing the <Widget::Save> group, then click any of the crash ID links on the Key Crash page. Clicking on a crash ID will load the Crashes page.

The myExampleCrasher sample uses the BugSplat Windows Native C++ SDK which means BugSplat will display function arguments and local variables for each function in the call stack. Just for yucks, let’s expand the <Widget::Save> stack frame and see what the arguments and local variables are.

Notice the argument filePath is equal to the value /does/not/exist. In this instance, the crash we’re chasing was actually a result of the Editor passing a bad path to Widget::Save! This means that we actually want to create subkeys at 6 frames into the call stack to isolate the actual problem.

Expand the <Editor::Save> row and click the Group Crashes to navigate to the next page. On the Group Crashes page click Remove Group to remove the existing group at 5 frames deep (Widget::Save). Next click Create Group to create a new group at 6 frames deep (Editor::Save). Click View Group to be taken to the Key Crash page.

Now that we understand the root cause of this bug we can push an issue to our defect tracking system. If you’ve hooked BugSplat up to your defect tracker page you can create a defect for the group instead of for an individual crash. Click the Create Defect button to create a defect and associate it with every crash in the specified group.

Some Bonus Content

In this tutorial, we’ve covered how to group crashes at a level other than the top of the call stack and push a defect into your defect tracker page. Grouping similar crashes is an important tool, but sometimes it can be a bit too specific. Often times when we have something we don’t care about at the top of a call stack we want a way to create a bunch of groups automatically.

Grouping crashes by a specified level of the call stack is a way to create a bunch of similar groups in one fell swoop. We can use group by level to reset our database to the state it was in at the beginning of the exercise. Select <KernelBase!RaiseException+0x69>9 in the Call Stack Explorer and Group Crashes by Level at level 1. This operation will remove any and all groups of crashes containing <KernelBase!RaiseException+0x69>9 at the top of the call stack.

Now that we’ve reset to the default groups, let’s revisit the Call Stack Explorer page for <KernelBase!RaiseException+0x69>9 and click Expand All.

Notice the call stack tree branches into 3 distinct groups. Also notice that the 1st and 2nd groups both contain a call to <Editor::Save> at level 5. If we were to dig into what’s causing the crashes in each of these groups we’d notice that <Editor::Save> is passing a bad path in these instances as well. Let’s click <Editor::Save> in the 1st group and use it to automatically create Groups at level 5.

The Summary page now contains 2 instances of Editor::Save, which seems odd. Why is this?

Group by Level will create a new group for each crash with <KernelBase!RaiseException+0x69>9 at the top of the stack for each different frame at the specified level. Since we had 2 groups with <Editor::Save> at level 5 they have been grouped together. The other group actually contains Gadget::Write at level 5, and earlier we created a similar group for that call stack at level 6 which was, you guessed it, Editor::Save. These are technically different groups, even though they share a common function name and line number as a Stack Key.

This situation can happen from time to time. If you’d like to see groups like this combined there is another trick you can keep up your sleeve. Navigate to the Summary page and click the Group By button, then select “Stack Key”.

We hope you found this tutorial helpful. If you have any questions please reach out to us using the in-app chat feature, or via our support email.

Thanks for using BugSplat, happy hunting!

Stay up to date

Subscribe to our newsletter to get all new posts and updates sent to your inbox.

*Subscribe to our newsletter to receive interesting stories, updates, and new products info.
Blog Coffee
Top