Introduction
The project introduced here is a mere extension of the excellent CrashRpt crash reporting system by Mike Carruth and zexspectrum who have both written CodeProject articles (here and here). It deals with some MFC and SysWOW64 specific pitfalls and adds support for continuing execution after a crash.
Background
Before coming across crashrpt, I had rolled my own code partly based on Hans' Dietrich's XCrashRpt. When the time came to add additional functionality, I looked around for well-supported but lightweight alternatives and considered several of the candidates that are listed in the crashrpt Wiki. To cut a long story short - I liked crashrpt
best by a long shot due to its relative simplicity which can be partly attributed to the fact that it targets Windows applications only allowing direct use of many of the Microsoft debugging tools in crash analysis.
It turned out that integrating it with my application was very straightforward and using the available documentation and that there was almost no fiddling with special cases. However, my application was created in MFC and runs both on 32bit machines, on 64bit machines in SysWOW64 and in native 64bit mode. As it turns out, both the use of MFC and running in SysWOW64 lead to a few issues that need to be dealt with to catch all crashes and correctly report their origin.
I also had the additional requirement that I wanted to allow users to continue execution after a crash. While this is generally considered a bad idea because the program's memory is most likely corrupted, it makes perfect sense in some scenarios. My users may have prepared for an experiment for days and run it for hours and if the application crashes because a division by zero in some minor online analysis or because I forgot to catch an exception somewhere, this work would be in vain even if I would allow save already acquired data the way WORD does it through Microsoft's "Dr.Watson".
Of course, this is only the second best solution next to writing better software but the mere existence of crash reporting systems shows that crashes do occur. Also in some environments, it may be necessary for users to use beta or alpha state software for their everyday work because doing it with the risk of failure is still better than not doing it at all.
Still, crashrpt
was so far ahead of everything I had programmed and was thoroughly tested and maintained by zex spectrum who turned out to also be extremely responsive to both proposals and the few minor issues I found in the original version. I therefore decided to introduce the required extensions to CrashRpt
and used the derivative in my program. After several months of using it without problems, I feel it is time to share this comparatively small contribution and what I know about the MFC issues with the community in the hope that it turns out to be useful to someone.
Using the Code
The code can be used in the same way as crashrpt, which has excellent documentation, FAQ and Wiki entries in addition to the articles mentioned above. The additional features are optional and documented in the CrashRptEx.h header. They are also briefly described below along with some background information.
In a Nutshell
In a nutshell, the following new options and functions are added.
CRASHRPTAPI(int) crAllowContinue(DWORD dwFlags);
CRASHRPTAPI(int) crDiscardError(CR_EXCEPTION_INFO &ei);
CRASHRPTAPI(int) crHandleError(CR_EXCEPTION_INFO &ei);
The first function chooses (for the current thread) whether the crash handler should allow program execution to continue. The exact behavior is controlled by dwFlags
which is a combination of:
(1) One of
CR_INST_APP_CONTINUE
(User chooses, default is termination)
CR_INST_APP_CONTINUE_DEFAULT
(User chooses, default is to continue)
CR_INST_APP_CONTINUE_ONLY
(Always continue)
CR_INST_APP_TERMINATE
(Terminate the application.)
In case one of the first three is chosen, a CR_EXCEPTION_INFO
is thrown instead of terminating the application which may be caught by an appropriate catch
clause in the applications calls stack.
(2) ... and optionally
CR_INST_APP_CONTINUE_NOSENDER
(Do not call crash sender, throw exception info.)
Which will not terminate the application and cause the crash sender not to be launched, even if CR_INST_APP_TERMINATE
is chosen. Instead, the application can call crHandleError
in the catch
clause which will behave according to the flags described in (1) or crDiscardError
(e.g. after logging the error) to quietly continue execution independently of these flags.
int crEnableProcessCallbackFilter(BOOL bEnable);
int crProcessCallbackFilterStatus();
To disable/enable an exception filter (this is a known Microsoft bug, see below) which is quietly swallowing exceptions raised in Windows callback routines in apps running under SysWOW64.
WNDPROC crInstallWndProcWrapper(pfnWndProc);
int crEnableWndProcWrapper(BOOL bEnable);
int crWndProcWrapperStatus();
To install (the first of these is actually a macro dealing with the peculiarities of MFC), enable or disable a wrapper around the windows procedure implementing the aforementioned catch
clause. This is necessary because Microsoft's Dr. Watson is launched when an exception occurs behind a call into the (MFC?) windows procedure never giving CrashRpt
a chance.
CRASHRPTAPI(int) crModifyFlags(DWORD dwFlags, DWORD dwMask);
Call any time during program execution. This function modifies the flags of an already installed crash handler without re-installation (which would require re-adding all files, etc.). This is mainly there because it is needed for the above functionality but can come in handy independently.
The ability to allow program execution to continue is demonstrated in both the WTL and the MFC test applications while all other features are only demonstrated in the MFC version. Let us have a look at how to integrate the new functionality in your application focusing on the latter test app.
CrashRpt
was designed to catch crashes and launch the CrashSender
application that keeps the crashed process alive until it has collected all information configured to be sent from it, allowing it to end and then sending the information about the process and crash by the configured method. The CrashSender
application is launched from the crash handler installed by the library. In order to allow the application to continue from a defined position (e.g., by unwinding the stack all the way to the message loop), a few changes were necessary to CrashRpt
. First and foremost, the crash handlers should no longer terminate the process. Let's look at a typical crash handler as installed by CrashRpt(Ex)
:
LONG WINAPI CCrashHandler::SehHandler(PEXCEPTION_POINTERS pExceptionPtrs)
{
CCrashHandler* pCrashHandler = CCrashHandler::GetCurrentProcessCrashHandler();
ATLASSERT(pCrashHandler!=NULL);
if(pCrashHandler!=NULL)
{
pCrashHandler->CrashLock(TRUE);
CR_EXCEPTION_INFO ei;
memset(&ei, 0, sizeof(CR_EXCEPTION_INFO));
ei.cb = sizeof(CR_EXCEPTION_INFO);
ei.exctype = CR_SEH_EXCEPTION;
ei.pexcptrs = pExceptionPtrs;
#ifdef CRASHRPT_EX
_s_HandleError(ei, pCrashHandler);
#else
pCrashHandler->GenerateErrorReport(&ei);
TerminateProcess(GetCurrentProcess(), 1);
#endif
}
return EXCEPTION_EXECUTE_HANDLER;
}
The code of the original version is still visible in the #else
clause. The original error report creation and the call to terminate the process have been moved to a function as there is some more code involved which is repeated in every crash handler. The new function is:
void _s_HandleError(CR_EXCEPTION_INFO &ei, CCrashHandler * pCrashHandler)
{
DWORD dwFlags = pCrashHandler ->IsContinueAllowed();
if (dwFlags)
{
pCrashHandler ->ModifyFlags(dwFlags, CR_INST_APP_CONTINUE_MASK);
if (dwFlags & CR_INST_APP_CONTINUE_NOSENDER)
throw ei;
}
pCrashHandler->GenerateErrorReport(&ei);
if (dwFlags)
{
WaitForSingleObject(ei.hSenderProcess, INFINITE);
DWORD dwExitCode = 1;
GetExitCodeProcess(ei.hSenderProcess, &dwExitCode);
if (dwExitCode & CR_INST_APP_CONTINUE)
{
pCrashHandler->CrashLock(FALSE);
pCrashHandler ->ChangeGUID();
throw ei;
}
}
switch(ei.exctype)
{
case CR_CPP_NEW_OPERATOR_ERROR:
case CR_CPP_INVALID_PARAMETER:
pCrashHandler->CrashLock(FALSE);
default:
;
}
TerminateProcess(GetCurrentProcess(), 1);
}
First, the handler checks whether continuation is allowed at all (we will see later that this may differ from thread to thread and can be changed at runtime at any time) and modifies the crash handler flags to pass this information on to the crash sender when it is launched. The next block...
if (dwFlags & CR_INST_APP_CONTINUE_NOSENDER)
throw ei;
...is due to another new option of the system which only makes sense if one has the intention of allowing applications to continue: If CR_INST_APP_CONTINUE_NOSENDER
is selected, the sender is not launched in the crash handler but the exception information is thrown and can be caught further up the call stack. The feature and its uses will be described below in more detail. If the option is not chosen, we launch the crash sender. It will display this new option to the user and if continuing is not set as the only option, the user can modify the default choice:
If the user has chosen to continue the application, we unlock the crash handler and change its GUID (CCrashHandler::ChangeGUID()
is a new function introduced for this purpose. It is necessary because a second crash could occur in the application later and we would like it to not overwrite the first crash in the queue and to be recognized as different by the crash analysis). The exception information is then thrown. The application can catch CR_EXCEPTION_INFO
references wherever unwinding should stop and can then decide whether to really continue execution based on it.
If the application is not to be continued, TerminateProcess()
is called after unlocking the crash handler where necessary.
Pitfalls When Catching Crashes (Not Only) in MFC Apps
Exceptions in SysWOW64
When running 32bit applications on a 64bit, the process will usually swallow exceptions behind a window callback. This is a windows bug which may stay in place because it has been there for a long time and some applications may have started to rely on it. For details, see this Microsoft KB article including a hotfix and this forum entry. CrashRptEx
contains the functions crEnableProcessCallbackFilter
and crProcessCallbackFilterStatus
that check for the presence of the hotfix and enable/disable the Callback Filter which is responsible for swallowing the exceptions.
To ensure that exceptions pass the callback (and thus are recognized by CrashRpt
), enable the hotfix as follows:
int ret = crEnableProcessCallbackFilter(FALSE);
You can query the state of the hotfix at any time by calling:
int ret = crProcessCallbackFilterStatus();
A return value of zero indicates that exceptions will pass a callback:
Exceptions Behind a Windows Callback
If your application uses MFC (and possibly not only then), there are additional issues that will cause the application to crash and Microsoft's Dr. Watson to kick in before CrashRpt
has a chance to identify the problem because an exception handler (for certain exceptions) is included in the code calling the windows procedure. This behavior can also be caused in applications that were not previously affected when hooking the Windows procedure using SetWindowsHookEx
. The solution is to include
History
- 05/06/2012: Initial version