Debugging Kernel Dumps: Episode 1

There’s not that much of a difference between C kernel programming and standard C programming, but one of the big ones is the fact that even the smallest bug will crash your system.

If you’ve spent any amount of time writing Windows Kernel Drivers, you’d know that the Blue Screen of Death (BSOD) is a very common occurrence. At first this may be discouraging, because the BSOD has a very negative connotation with it. No one likes their system crashing. But for kernel developers, it provides us with a snapshot in time which we can use to understand not only what the bug is, but why it is happening.


The bug…

The bug that I’m going to be talking about this week originates from this one line of code:

if (!wcsstr(processName->Buffer, L"w3wp"))

This line appears in almost all three of the major parts of W3WProtect:

  1. Process Monitoring

  2. Registry Monitoring

  3. File System Monitoring

I use this line to ensure that the process we’re interacting with is the right process, ensuring that we don’t mess with stuff we shouldn’t be touching with. This check works fine for the most part, but once in a blue moon it would be the cause of a BSOD.

When you have a BSOD, you will get a NTSTATUS code which will tell you the error that caused the crash. The status code for this BSOD was “ACCESS_VIOLATION”, but how can I get an access violation for a string that I created?


Analyzing a Crash Dump

WinDbg is essentially your one and only choice when it comes to debugging the Windows Kernel, and that is by no means a bad thing. This gem of a tool is an absolute powerhouse once you get past that initial learning curve.

When loading a crash dump into WinDbg, you’re presented with the suggestion of running !analyze -v. This is a great way of initializing your troubleshooting because it gives an overview of what the error message is, and the state that the kernel was in at the time of the crash. This includes registers, a stack trace, and if you have the pdb/source code, the exact line of the problem.

Because I have the source code loaded, I can confirm the cause of the problem:

FAULTING_SOURCE_CODE:  
   409: 		//
   410: 		// If the process isn't w3wp,
   411: 		// leave it alone.
   412: 		//
>  413: 		if (!wcsstr(processName->Buffer, L"w3wp"))
   414: 			return STATUS_SUCCESS;
   415: 
   416: 
   417: 		args = (PREG_POST_OPERATION_INFORMATION)Argument2;
   418: 
SYMBOL_NAME:  w3wprotect!PtRegNotify+147

If we look at the stack trace, we can find out function about half way down, after a chain of 12 functions of the kernel trying to prevent the BSOD.

fffff30d`a320ca70 fffff807`2a48f21d     : fffffafd`7ebf5000 fffff30d`a320d300 ffff8000`00000000 00000000`00000000 : nt!KiDispatchException+0x16e
fffff30d`a320d120 fffff807`2a48b405     : fffff807`2a72db00 fffff807`2a2eefbb ffffb308`2e1c4420 00000000`00000000 : nt!KiExceptionDispatch+0x11d
fffff30d`a320d300 fffff807`2a45bd73     : fffff807`2ea32907 fffff30d`a320d720 fffff30d`a320d900 ffffb308`2dfa3c10 : nt!KiPageFault+0x445
fffff30d`a320d498 fffff807`2ea32907     : fffff30d`a320d720 fffff30d`a320d900 ffffb308`2dfa3c10 00000000`00000000 : nt!wcsstr+0x13
fffff30d`a320d4a0 fffff807`2a8aa84f     : 00000000`00000000 00000000`00000001 fffff30d`a320d720 fffff807`2a721c60 : w3wprotect!PtRegNotify+0x147 [D:\driverSamples\w3wprotect\w3wprotect\registry.c @ 413] 
fffff30d`a320d510 fffff807`2a906720     : fffff30d`00000001 fffff30d`a320d720 00000000`00000000 00000000`00000001 : nt!CmpCallCallBacksEx+0x39f

I can put WinDbg within the context of that functions’ stack frame using the .frame command, allowing us to see the state of the local variables. My function was the 0x0d (or 13th) frame. Looking at the frame, we get the following local variables:

1: kd> .frame 0n13;dv /t /v
0d fffff30d`a320d4a0 fffff807`2a8aa84f     w3wprotect!PtRegNotify+0x147 [D:\driverSamples\w3wprotect\w3wprotect\registry.c @ 413] 
fffff30d`a320d510 void * CallbackContext = 0x00000000`00000000
fffff30d`a320d518 void * Argument1 = 0x00000000`00000001
fffff30d`a320d520 void * Argument2 = 0xfffff30d`a320d720
fffff30d`a320d4f8 struct _REG_POST_OPERATION_INFORMATION * args = 0xfffff30d`a320d900
fffff30d`a320d4e8 struct _UNICODE_STRING * keyName = 0x00000000`00000000
fffff30d`a320d4d0 long status = 0n0
fffff30d`a320d4f0 struct _UNICODE_STRING * processName = 0xffffb308`2e4b2b30 ""

The local variables are as follows:

  • CallbackContext, Argument1, Argument2 and args are the callback parameters provided when our function is called.

  • KeyName will be used to store the name of the registry key being modified.

  • Status is a general variable used to store NTSTATUS codes and see if the previous operation failed.

  • ProcessName is the seen above and what we compare against L'w3wp"

Because we know the structure of a _UNICODE_STRING, we can look at the address and put it into the relevant structure. Because this is a pointer to a unicode string, we will open up the address that it points too. Looking at that, we find our problem:

FFFFB308`2E4B2B30  0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000

There’s about 500 bytes of zeroes….


Null Bytes

The reason becomes pretty clear if we look at the code that we use to get the process name:

	// Allocate memory for the string. 
	processName = (UNICODE_STRING*)ExAllocatePoolWithTag(
		PagedPool,
		PTDEF_REG_LENGTH_PROCESS_NAME,
		PTDEF_REG_TAG_PROCESS_NAME
	);
	if (!processName)
		return STATUS_SUCCESS;
	
	// Zero the memory to remove junk data. 
	RtlZeroMemory(processName, PTDEF_REG_LENGTH_PROCESS_NAME);
	
	// Get the name of the process. 
	status = ZwQueryInformationProcess(
		NtCurrentProcess(),				// Process Handle
		ProcessImageFileName,				// PROCESSINFOCLASS
		processName,					// Buffer
		PTDEF_REG_LENGTH_PROCESS_NAME,		// SizeOfBuffer
		NULL						// Return size. 
	);
	if (!NT_SUCCESS(status))
            return STATUS_SUCCESS;

If you'd believe it, PTDEF_REG_LENGTH_PROCESS_NAME is set to 500. That just so happens to be the length that we zero memory with.

So for some reason, the ZwQueryInformationProcess is not returning a process name, but our status (seen in the local variables from above) is returning a 0x0 (STATUS_SUCCESS).

It's important to note that when our callback function gets called, we operate under the context of the process trying to make the change.

So, why did we get a NULL array for our process name?

Because we're operating under the context of the process that triggered the event, we can call the !process command in WinDBG to see what process we're operating under. Lo and Behold, we're operating as the "System" process and therefore in kernel space.

1: kd> !process
PROCESS ffffd58d6fa6d300
    SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
    DirBase: 001ad000  ObjectTable: ffffb30825409ec0  HandleCount: 2354.
    Image: System

Wrap up

So, it looks like we have a cause for the bug.

Hopefully a quick solution to this error will be adding an if statement checking if the return size or the process names length is zero. We’ll also just return a status success. It’s not w3wp, so we don’t want to interact with it.

if (!NT_SUCCESS(status) ||
    retSize == 0 ||
    processName->Length == 0)
    return STATUS_SUCCESS;

Making those changes, I then rebuilt the driver, setup my debugger and see if we encounter the issue again. I set a breakpoint on the return, so we ensure that it's being used properly.

After about 20 minutes of waiting, we get a hit on the new code! Successfully catching the system process. With that now done, I can put the bug in the done pile and move onto the next.

Previous
Previous

W3WProtect: Kernel ETW and Velociraptor

Next
Next

W3WProtect: Writing a minifilter