9 minute read

UCL Course COMP0133: Distributed Systems and Security LEC-17


Network-based vs. Host-based

Network-based

Pros

  • Can see lots of traffic at one monitoring point

  • Can filter traffic for many vulnerable hosts

Cons

  • Limited information available: only packet fields, payload contents

Host-based

Pros

  • More information available: see effect of network request on running process’s execution

  • Potentially more accurate

Cons

  • Requires changes to host software

  • Performance concern (not slow down)


Design Goals

  • Work on executables (no source code required)

  • Prevent broadest possible range of exploits

  • Low/No false positives and false negatives

  • Minimal performance reduction


W \( \oplus \) X Page Protections

Background

CPU implements page protection in hardware

For each 4K memory page, permission bits specified in page table entry in kernel: read, write

If the page can be read (or read and write), it can be executable implicitly

Problem

  • Code supplied by user in input data

  • Execution transferred to user input data

User-influenced Data and Control Data are mixed

Idea

Not let CPU execute instructions stored in data pages

such that each page should either be writable or executable but NOT BOTH: W \( \oplus \) X

  • Text Pages: X not W

  • Data Pages: W not X (e.g., stack, heap)

Implementations

AMD and Intel introduced “NX” bit (no execute)

Linux PaX implements W \( \oplus \) X for x86 processors without NX bit hardware

  • Based on segment limit registers

  • Halve address space available to each process

  • Minor performance reduction

Limitations

Break JIT

W \( \oplus \) X Page Protection break just-in-time (JIT) code generation in legacy applications

JIT:
Applications generate codes while running,
and then jump into the generated code (e.g., run-time type checking in JavaScript)

The return-to-libc Attack

Instead of putting shellcode on stack but put arguments of function in libc there

then overwrite return address with pointer to that function

e.g., system("/bin/sh");

such that the buffer overflow attacks overwrite the return address by the address of system in text segment

and the only argument is stored on stack as part of user input


Address Space Layout Randomization

Background

To operate the return-to-libc attacks, attacker must know, predict, or guess the exact address

including shellcode buffer address, libc function address, string argument address, etc.

Idea

Randomize addresses in process to let attackers guess wrong address with high probability such that

  • Jump to unmapped memory: crash

  • Jump to invalid instruction stream: crash

Useful as efficient exploit detector (memory faults or illegal instructions suggest exploit)

Implementations

x86-32

Linux process contains three memory regions

  • Executable: text, init data, uninit data

  • Mapped: heap, dynamic (shared) libraries, thread stacks, shared memory

  • Stack: user stack

ASLR adds random offset to each region when process created (with 16, 16, 24 bits randomness)

This is easy to be supported by virtual memory hardware

Limitations

x86-32: Random Offset to Mapped Region is Limited to 16 Bits

  • highest 4 bits (28-31) cannot be changed; otherwise, it would interfere with big mmap()s

  • lowest 12 bits (00-11) cannot be changed; otherwise, it make mmap()ed pages not be page-aligned (should be in 4K boundary)

x86-32: Derandomization Attacks

Limitation of 16 bits random offset is easy to guess by brute force attacks in mapped region

After the random offset is known, the exact address can be predicted such that return-to-libc attacks will work again

  • Should know the base of mapped region without random offset (by objdump tools)

  • Try to return to usleep(), guessing random offset for mapped region each time

    the usleep() argument is selected to 0x01010101 (16 seconds) that is long enought to observe

  • If guess wrong, target process crashes, the kernel closes connection immediately;

    The parent process creates new child with same random offset as previous died child using fork()

    In Unix, the contents of memory are copied from parent process to child process

    and some of them include pointers will not be valid (not point to the right place) if addresses are changed

  • If guess right, target process delays in usleep(), then crashes and closes connection immediately

  • After the right guess, compute the exact address of target address to perform return-to-libc attacks

Details of attack is illustrated in Section 2.2.2 in paper On the Effectiveness of Address-Space Randomization

x86-64: Derandomization Attacks

The random offsets will be increased to about 40 bits

to make brute force attack much harder without attracting attention

Performance of Derandomization Attacks

The performance in x86-32 is shown in Section 2.2.3 in paper On the Effectiveness of Address-Space Randomization

  • For single randomization at startup, the expected number of probes is \( 2^{n-1} \)

  • For n-bit full randomization (parent and child) after every crash, the expected number of probes is \( 2^n \) (only twice)

The analysis is in Section 3.2 in paper On the Effectiveness of Address-Space Randomization


Taint Check

Approach

Instrument program to monitor its own execution and detect when exploit occurs

Goal

  • Work on binaries (no source code required)

  • Low false positives and false negatives

  • Detect wide range of exploits (new varieties all the time & point solutions unconvincing)

  • Help humans understand how exploit worked after the exploit

Idea

Many exploits use data supplied by user (or derived from data supplied by user)

to subvert control flow of program (e.g., modify jump, call instruction target addresses, or function return addresses)

During execution, before any control transfer instruction, validate target address not derived from user-supplied data

  • If it is, exploit detected and raise alarm

  • If it not, continue execution normally

NOT TRUST user-supplied data by marking them as tainted

and propagate taint during execution

  • Results of operations on tainted data should be tainted

  • Copies of tainted data should be tainted

  • Taint will be cleared when tainted data is overwritten with untainted data

Implementations

Propagatation

  • After I/O system calls

    If reading from socket, mark targetbuffer contents as tainted

  • After all memory load instructions

    If source memory tainted, mark register tainted

    If source memory untainted, mark register untainted

  • After all memory store instructions

    If source register tainted, mark memory tainted

    If source register untainted, mark memory untainted

  • After all arithmetic instructions

    If any operand tainted, mark result tainted

    If no operands tainted, mark result untainted

Detection

Add the following checking before all control transfer instructions (e.g., jump, call, ret, etc.)

  • If register or memory location holding target function pointer is tainted, raise alarm

  • If not, continue execution normally

Shadow Memory

The taint status (flag) of each byte if memory is tracked by shadow memory

e.g., Is-Taint(addr) -> { T | F } and Taint(addr, len) or Untaint(addr, len)

Operation Mode

  • Fast

    single bit for taint flag of each byte of memory

    (if single bit taint flag of each bit of memory will be much faster without packing bits to byte)

  • Detailed

    four-byte pointer to taint data structure (including details of system call, stack, value)

    written at time of tainting and useful for exploit analysis

Expansions

can successfully detects many overwrite exploits (targeting at return address, function pointer, format string, GOT)

since it can also instrument function and system calls

Limitations

Greatly affect performance

When executing with a single request which returns small data (i.e., 1 KB),

the overhead will be multiple times (i.e., 25 times) than CGI without Valgrind (as baseline)

but the overhead of CGI (asking dynamic script on the server) is already very high due to invokings and interpreters

Moreover, the evaluation time might not be a right metric where throughput is better (can use concurrent requests)

Implicit Flows

suppose x is tainted:

if (x = 0) 
    y = 0; 
else 
    y = 1;

such that y is set by a constant that is directly influenced by x (and y will influence other values)

However, taint propagatation not include condition flags (possible false negatives)

If including condition flags, it will results in inappropriate propagation of taint (possible false positives)

Applications

Identify worm payloads

Can be configured to store trace of tainted data flow from all inputs

When exploit detected, can walk back to identify input that led to exploit

Could pass worm payloads to signature generation system, like Autograph

which is Much more accurate than port-scanner heuristic

Prevent exploit of server

Halt execution upon exploit detection

but due to high overhead, it is too slow for production servers

Moreover,

adversary may possibly be able to
detect monitored servers by their slow response time and avoid sending them exploit payload

Therefore, it might be useful to deploy on a few servers which use sample traffic as input