One of our customers recently reported a problem with running our software in tandem with VMware vShield. Although the solution turned out to be quite simple, the investigation process is worth writing a blog post about, since it revealed some interesting facts related to driver behavior at boot time.
The main idea behind VMware vShield is to provide antivirus services to virtual machines without installing a full antivirus solution on guest operating systems. Input data from virtual machines is obtained from a small driver named vsepflt.sys and called vShield Endpoint Thin Agent. The driver registers itself as a file system minifilter and then communicates with the antivirus solution on the host system. Unfortunately, when our software was running inside a virtual machine, the vsepflt.sys driver mysteriously refused to start at boot time, although it worked fine when loaded manually.
We thought the manual load solutions wasn’t good enough, because even though it would probably work fine in practice, it might indicate something nasty inside our software. All we knew was that the vsepflt.sys driver initialization failed with the error code 0x80000011 (STATUS_DEVICE_BUSY).
Since there was no direct or indirect indication where the problem might be in our software and it wasn’t easily possible to examine the exact conditions on the customer's machine during the system boot, I decided to look into the vsepflt.sys driver a bit. The driver (at least its initialization routines) turned out to be quite well-written – all error paths in its DriverEntry function (the routine responsible for driver initialization) seemed to produce a debug print saying what probably went wrong. Routine names were usually included within the debug messages which made the code more readable. Also, routines responsible for debug prints suggested that the messages aren’t just sent to the debugger, but also to another place outside the virtual machine where the customer can collect them and provide them to us (enabling debug logging for the VMware Tools vShield Endpoint thin agent driver). After a while this actually happened, and after instructing VMware to produce more detailed logs, the following debug message was discovered:
Listing 1: Debug messages from vsepflt.sys
DEBUG: VFileUmcReadParams : umcMsgTimeoutMs: 2000
DEBUG: VFileUmcReadParams : umcMsgTimeout100Ns: -20000000
INFO: DriverEntry : Created EPSec control device: 0xfffffa8003e1a6a0.
DEBUG: VSockLayer_Init : Initializing VSockLayer module
ERROR: VSockLayer_Init : VMCISock_WskCaptureProviderNPI: 0x80000011
ERROR: VFileSocketMgr_Init : Failed to initialize vsock layer: 0x80000011
ERROR: DriverEntry : VFileSocketMgr_Init() failed: 0x80000011
AUDIT: DriverEntry : vfileFilter build-2530600 load failed: 0x80000011
The message indicated that a call to the WskCaptureProviderNPI routine failed with a STATUS_DEVICE_BUSY error code. Those of you familiar with the Winsock Kernel socket-like interface introduced on Windows Vista that allows kernel components to conveniently communicate over network know that the routine with the same name needs to be called when initializing the client side of the interface (registering a Winsock Kernel application). However, it didn’t prove to be that simple. The vsepflt.sys driver does not use Winsock Kernel (that would cause problems with pre-Vista compatibility) and neither of the WSK initialization routines (WskRegister, WskCaptureProviderNPI) contain a code path which obviously contained the error code we had seen. So, a deeper analysis was required.
It turned out that the vsepflt.sys driver uses the VMware vSockets interface to communicate with software on the host operating system. VMware vSockets is a socket-like interface made specifically for that purpose. It consists of a kernel driver and a DLL that registers as Winsock Service Provider (WSP), so an application may use standard socket functions to utilize the interface; it just needs to specify a special address family constant when creating a socket. The good news was that the vSockets interface is documented. Unfortunately, there was also bad news: the documentation clearly says that the interface is intended to be used only by applications and not by third-party kernel components.
That caused quite a problem since we had speculated about creating a simple file system minifilter driver that would act like the vsepflt.sys, at least during the initialization phase. Fortunately, we knew quite precisely where things went wrong inside vsepflt.sys (the debug messages helped here). Further reverse engineering revealed that the failure is caused by sending an IOCTL request to device object named \Device\vmci served by \Driver\vsock, the driver implementing the kernel part of VMware vSockets. The discovery allowed us to create a test driver that attempted to send the same IOCTL during its initialization. The code sending the IOCTL is shown on Listing 2. You can see that the IOCTL is not very hard to build since it just needs to zero the whole input buffer except the first DWORD, which needs to be set to 1. Things are a little bit more complicated inside vsepflt.sys; there is probably a static library that handles vSockets operations but that hardly matters at all if we are only interested in sending the IOCTL.
Listing 2: Sending the WskCaptureProviderNPI-like IOCTL (for 32-bit Windows versions)
NTSTATUS _ConnectVSocks(void)
{
LARGE_INTEGER timeout;
PIRP irp = NULL;
PIO_STACK_LOCATION irpStack = NULL;
KEVENT event;
IO_STATUS_BLOCK iosb;
PDEVICE_OBJECT deviceObject = NULL;
PFILE_OBJECT fileObject = NULL;
UNICODE_STRING uDeviceName;
NTSTATUS status = STATUS_UNSUCCESSFUL;
timeout.QuadPart = -10000000;
RtlInitUnicodeString(&uDeviceName, L"\\DosDevices\\vmci");
for (SIZE_T i = 0; i < 100; ++i) {
status = IoGetDeviceObjectPointer(&uDeviceName, 0x1, &fileObject, &deviceObject);
if (NT_SUCCESS(status)) {
ULONG inputBuffer[0x5C / sizeof(ULONG)];
memset(inputBuffer, 0, 0x5C);
inputBuffer[0] = 1;
KeInitializeEvent(&event, NotificationEvent, FALSE);
irp = IoBuildDeviceIoControlRequest(0x810320C0, deviceObject, inputBuffer, sizeof(inputBuffer), inputBuffer, sizeof(inputBuffer), FALSE, &event, &iosb);
if (irp != NULL) {
irpStack = IoGetNextIrpStackLocation(irp);
irpStack->FileObject = fileObject;
status = IoCallDriver(deviceObject, irp);
if (status == STATUS_PENDING) {
(VOID)KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);
status = iosb.Status;
}
if (NT_SUCCESS(status))
break;
} else {
status = STATUS_INSUFFICIENT_RESOURCES;
ObDereferenceObject(fileObject);
}
}
if (!NT_SUCCESS(status))
KeDelayExecutionThread(KernelMode, FALSE, &timeout);
}
return status;
}
After the test driver was built, we were able to successfully reproduce the problem in our environment. When our software was running, the IOCTL always failed with STATUS_DEVICE_BUSY, no matter how many attempts we made (Listing 2 indicates that the test driver attempts to send the IOCTL 100 times before giving up). The best thing about the test driver, however, was that we were able to examine the operating system state. The most interesting thing we found was the list of boot-start drivers initialized (or being initialized) that is shown on Listing 3. To compare the list with a machine containing just VMware Tools, see Listing 4. I decided to list driver objects, rather than mapped driver files, to get a better idea of which of them were already initialized.
Listing 3: drivers initialized early at boot-time on a machine running our software
0: kd> !object \Driver
Object: 8c458b08 Type: (85547b50) Directory
ObjectHeader: 8c458af0 (new version)
HandleCount: 0 PointerCount: 23
Directory Object: 8c408ed0 Name: Driver
Hash Address Type Name
---- ------- ---- ----
01 855d2a70 Driver LSI_SAS
04 8556bf38 Driver vmbus
8555c798 Driver Compbatt
85628f38 Driver msisadrv
05 855665c8 Driver mountmgr
08 8556b788 Driver atapi
09 8555d7d8 Driver volmgrx
11 8556da20 Driver amdxata
16 8556b670 Driver vsock
17 85566e40 Driver vmci
18 856481b8 Driver 00000142
855f18a8 Driver WMIxWDM
855f92d8 Driver ACPI_HAL
25 8555c670 Driver volmgr
29 85628978 Driver pci
30 8555ce10 Driver partmgr
32 8554c908 Driver ACPI
8556d030 Driver msahci
8554d320 Driver Wdf01000
33 85619668 Driver PnpManager
34 856287d0 Driver vdrvroot
36 85566f38 Driver intelide
0: kd> !drvobj \Driver\vmci
Driver object (85566e40) is for:
*** ERROR: Symbol file could not be found. Defaulted to export symbols for vmci.sys -
\Driver\vmci
Driver Extension List: (id , addr)
Device Object list:
0: kd> !object \Global??\VMCIDev
Object Global??\VMCIDev not found
Listing 4: List of initialized boot-time drivers on a nearly clean machine
0: kd> !object \Driver
Object: 8c067dd8 Type: (851ce040) Directory
ObjectHeader: 8c067dc0 (new version)
HandleCount: 0 PointerCount: 23
Directory Object: 8c004e18 Name: Driver
Hash Address Type Name
---- ------- ---- ----
01 85825460 Driver LSI_SAS
04 8539f1e8 Driver vmbus
85b60730 Driver Compbatt
85ca3330 Driver msisadrv
05 8539b1e8 Driver mountmgr
08 8537f1e8 Driver atapi
09 85cf1998 Driver volmgrx
11 85213568 Driver amdxata
16 8539e1e8 Driver vsock
17 853291e8 Driver vmci
18 852471b8 Driver 00000142
85230030 Driver WMIxWDM
8522f908 Driver ACPI_HAL
25 85b17030 Driver volmgr
29 858f3088 Driver pci
30 85a4a728 Driver partmgr
32 853971e8 Driver msahci
8514c8c0 Driver ACPI
85235db8 Driver Wdf01000
33 85222ce0 Driver PnpManager
34 85be4e88 Driver vdrvroot
36 853411e8 Driver intelide
0: kd> !drvobj \Driver\vmci
Driver object (853291e8) is for:
*** ERROR: Symbol file could not be found. Defaulted to export symbols for vmci.sys -
\Driver\vmci
Driver Extension List: (id , addr)
Device Object list:
8584c6e8 8584c890
0: kd> !object \Global??\VMCIDev
Object: 8c1cb978 Type: (851cef78) SymbolicLink
ObjectHeader: 8c1cb960 (new version)
HandleCount: 0 PointerCount: 1
Directory Object: 8c008d20 Name: VMCIDev
Target String is '\Device\VMCIHostDev'
0: kd> !devobj \Device\VMCIHostDev
Device object (8584c890) is for:
VMCIHostDev \Driver\vmci DriverObject 853291e8
Current Irp 00000000 RefCount 0 Type 00000004 Flags 00002040
Dacl 8c1fbca8 DevExt 8584c948 DevObjExt 8584c9c8
ExtensionFlags (0000000000)
Characteristics (0x00000100) FILE_DEVICE_SECURE_OPEN
AttachedTo (Lower) 8522f030 \Driver\PnpManager
Device queue is not busy.
The strange thing about Listing 3 was that none of our drivers was actually loaded, although they were registered as boot-time drivers. The vsock.sys driver was loaded and probably initialized enough to receive IOCTL requests. I decided to take a quick look into that driver too, just to see if there were any obvious dependencies with other drivers. And there was one.
The vsock.sys driver communicates with another VMware driver, vmci.sys (\Driver\vmci, do not confuse with \Device\vmci owned by \Driver\vsock) that is also started at boot time. The \DosDevices\VMCIDev symbolic link is used to establish a communication channel to the driver. As can be seen from Listing 3, no such symbolic link exists while it is present on Listing 4. So, I deduced that our software somewhat restricts the vmci.sys driver from creating its device objects early enough.
The vmci.sys driver serves a virtual PnP device of System setup class that probably represents a root bus device for all activity related to VMware Tools. Since one of our drivers registers as an upper filter for this setup class, initialization of the vmci.sys driver was postponed until all upper filters were ready (specifying Driver Load Order). That's exactly why the driver wasn’t ready when needed by vsock.sys. And that caused the STATUS_DEVICE_BUSY problem. Remember that Listing 3 shows that none of our drivers are loaded that early during boot.
The whole story came to its happy end at last. We proposed two solutions to the problem: removing our upper filter for the System device setup class or moving our drivers into a load order group, such as System Boot Extender, that loads earlier at boot time. We learned again that one is never careful enough when it comes to the initialization of boot drivers. Messing with the boot-time load order may not only bring the system into unbootable state, but can also lead to problems with much less obvious causes.
Finally, we would like to thank our customer for being really helpful when we needed additional information related to the problem. This is not always the case.
Written by Martin Drab, Developer @Safetica Technologies
Martin always digs deep to know how things work under the hood. Especially interested in low-level and kernel programming on Windows, he maintains core parts of the Device Control for you. A flawlessly working code, an interesting technical article and a good swimming time are things that please him most.
Martin always digs deep to know how things work under the hood. Especially interested in low-level and kernel programming on Windows, he maintains core parts of the Device Control for you. A flawlessly working code, an interesting technical article and a good swimming time are things that please him most.