Host-Based Exploit Defenses

9 minute read

UCL Course COMP0133: Distributed Systems and Security LEC-17

Network-based vs. Host-based

Network-based

Pros

Can see lots of traffic at one monitoring point
Can filter traffic for many vulnerable hosts

Cons

Limited information available: only packet fields, payload contents

Host-based

Pros

More information available: see effect of network request on running process’s execution
Potentially more accurate

Cons

Requires changes to host software
Performance concern (not slow down)

Design Goals

Work on executables (no source code required)
Prevent broadest possible range of exploits
Low/No false positives and false negatives
Minimal performance reduction

W \( \oplus \) X Page Protections

Background

CPU implements page protection in hardware

For each 4K memory page, permission bits specified in page table entry in kernel: read, write

If the page can be read (or read and write), it can be executable implicitly

Problem

Code supplied by user in input data
Execution transferred to user input data

User-influenced Data and Control Data are mixed

Idea

Not let CPU execute instructions stored in data pages

such that each page should either be writable or executable but NOT BOTH: W \( \oplus \) X

Text Pages: X not W
Data Pages: W not X (e.g., stack, heap)

Implementations

AMD and Intel introduced “NX” bit (no execute)

Linux PaX implements W \( \oplus \) X for x86 processors without NX bit hardware

Based on segment limit registers
Halve address space available to each process
Minor performance reduction

Limitations

Break JIT

W \( \oplus \) X Page Protection break just-in-time (JIT) code generation in legacy applications

JIT:
Applications generate codes while running,
and then jump into the generated code (e.g., run-time type checking in JavaScript)

The return-to-libc Attack

Instead of putting shellcode on stack but put arguments of function in libc there

then overwrite return address with pointer to that function

e.g., system("/bin/sh");

such that the buffer overflow attacks overwrite the return address by the address of system in text segment

and the only argument is stored on stack as part of user input

Address Space Layout Randomization

Background

To operate the return-to-libc attacks, attacker must know, predict, or guess the exact address

including shellcode buffer address, libc function address, string argument address, etc.

Idea

Randomize addresses in process to let attackers guess wrong address with high probability such that

Jump to unmapped memory: crash
Jump to invalid instruction stream: crash

Useful as efficient exploit detector (memory faults or illegal instructions suggest exploit)

Implementations

x86-32

Linux process contains three memory regions

Executable: text, init data, uninit data
Mapped: heap, dynamic (shared) libraries, thread stacks, shared memory
Stack: user stack

ASLR adds random offset to each region when process created (with 16, 16, 24 bits randomness)

This is easy to be supported by virtual memory hardware

Limitations

x86-32: Random Offset to Mapped Region is Limited to 16 Bits

highest 4 bits (28-31) cannot be changed; otherwise, it would interfere with big mmap()s
lowest 12 bits (00-11) cannot be changed; otherwise, it make mmap()ed pages not be page-aligned (should be in 4K boundary)

x86-32: Derandomization Attacks

Limitation of 16 bits random offset is easy to guess by brute force attacks in mapped region

After the random offset is known, the exact address can be predicted such that return-to-libc attacks will work again

Should know the base of mapped region without random offset (by objdump tools)
Try to return to usleep(), guessing random offset for mapped region each time

the usleep() argument is selected to 0x01010101 (16 seconds) that is long enought to observe
If guess wrong, target process crashes, the kernel closes connection immediately;

The parent process creates new child with same random offset as previous died child using fork()

In Unix, the contents of memory are copied from parent process to child process

and some of them include pointers will not be valid (not point to the right place) if addresses are changed
If guess right, target process delays in usleep(), then crashes and closes connection immediately
After the right guess, compute the exact address of target address to perform return-to-libc attacks

Details of attack is illustrated in Section 2.2.2 in paper On the Effectiveness of Address-Space Randomization

x86-64: Derandomization Attacks

The random offsets will be increased to about 40 bits

to make brute force attack much harder without attracting attention

Performance of Derandomization Attacks

The performance in x86-32 is shown in Section 2.2.3 in paper On the Effectiveness of Address-Space Randomization

For single randomization at startup, the expected number of probes is \( 2^{n-1} \)
For n-bit full randomization (parent and child) after every crash, the expected number of probes is \( 2^n \) (only twice)

The analysis is in Section 3.2 in paper On the Effectiveness of Address-Space Randomization

Taint Check

Approach

Instrument program to monitor its own execution and detect when exploit occurs

Goal

Work on binaries (no source code required)
Low false positives and false negatives
Detect wide range of exploits (new varieties all the time & point solutions unconvincing)
Help humans understand how exploit worked after the exploit

Idea

Many exploits use data supplied by user (or derived from data supplied by user)

to subvert control flow of program (e.g., modify jump, call instruction target addresses, or function return addresses)

During execution, before any control transfer instruction, validate target address not derived from user-supplied data

If it is, exploit detected and raise alarm
If it not, continue execution normally

NOT TRUST user-supplied data by marking them as tainted

and propagate taint during execution

Results of operations on tainted data should be tainted
Copies of tainted data should be tainted
Taint will be cleared when tainted data is overwritten with untainted data

Implementations

Propagatation

After I/O system calls

If reading from socket, mark targetbuffer contents as tainted
After all memory load instructions

If source memory tainted, mark register tainted

If source memory untainted, mark register untainted
After all memory store instructions

If source register tainted, mark memory tainted

If source register untainted, mark memory untainted
After all arithmetic instructions

If any operand tainted, mark result tainted

If no operands tainted, mark result untainted

Detection

Add the following checking before all control transfer instructions (e.g., jump, call, ret, etc.)

If register or memory location holding target function pointer is tainted, raise alarm
If not, continue execution normally

Shadow Memory

The taint status (flag) of each byte if memory is tracked by shadow memory

e.g., Is-Taint(addr) -> { T | F } and Taint(addr, len) or Untaint(addr, len)

Operation Mode

Fast

single bit for taint flag of each byte of memory

(if single bit taint flag of each bit of memory will be much faster without packing bits to byte)
Detailed

four-byte pointer to taint data structure (including details of system call, stack, value)

written at time of tainting and useful for exploit analysis

Expansions

can successfully detects many overwrite exploits (targeting at return address, function pointer, format string, GOT)

since it can also instrument function and system calls

Limitations

Greatly affect performance

When executing with a single request which returns small data (i.e., 1 KB),

the overhead will be multiple times (i.e., 25 times) than CGI without Valgrind (as baseline)

but the overhead of CGI (asking dynamic script on the server) is already very high due to invokings and interpreters

Moreover, the evaluation time might not be a right metric where throughput is better (can use concurrent requests)

Implicit Flows

suppose x is tainted:

if (x = 0) 
    y = 0; 
else 
    y = 1;

such that y is set by a constant that is directly influenced by x (and y will influence other values)

However, taint propagatation not include condition flags (possible false negatives)

If including condition flags, it will results in inappropriate propagation of taint (possible false positives)

Applications

Identify worm payloads

Can be configured to store trace of tainted data flow from all inputs

When exploit detected, can walk back to identify input that led to exploit

Could pass worm payloads to signature generation system, like Autograph

which is Much more accurate than port-scanner heuristic

Prevent exploit of server

Halt execution upon exploit detection

but due to high overhead, it is too slow for production servers

Moreover,

adversary may possibly be able to
detect monitored servers by their slow response time and avoid sending them exploit payload

Therefore, it might be useful to deploy on a few servers which use sample traffic as input

Endong Liu

Host-Based Exploit Defenses

Network-based vs. Host-based

Network-based

Host-based

Design Goals

W \( \oplus \) X Page Protections

Background

Problem

Idea

Implementations

Limitations

Address Space Layout Randomization

Background

Idea

Implementations

Limitations

Taint Check

Approach

Goal

Idea

Implementations

Expansions

Limitations

Applications

You may also enjoy

Academic Blockchain Papers

Web3 Ecosystem Contributions

Tornado Cash Deposit and Withdrawal via CLI

Tor Project (Tor Browser + Debian Repository)