Host-Based Exploit Defenses
UCL Course COMP0133: Distributed Systems and Security LEC-17
Network-based vs. Host-based
Network-based
Pros
-
Can see lots of traffic at one monitoring point
-
Can filter traffic for many vulnerable hosts
Cons
- Limited information available: only packet fields, payload contents
Host-based
Pros
-
More information available: see effect of network request on running process’s execution
-
Potentially more accurate
Cons
-
Requires changes to host software
-
Performance concern (not slow down)
Design Goals
-
Work on executables (no source code required)
-
Prevent broadest possible range of exploits
-
Low/No false positives and false negatives
-
Minimal performance reduction
W \( \oplus \) X Page Protections
Background
CPU implements page protection in hardware
For each 4K memory page, permission bits specified in page table entry in kernel: read, write
If the page can be read (or read and write), it can be executable implicitly
Problem
-
Code supplied by user in input data
-
Execution transferred to user input data
User-influenced Data and Control Data are mixed
Idea
Not let CPU execute instructions stored in data pages
such that each page should either be writable or executable but NOT BOTH: W \( \oplus \) X
-
Text Pages: X not W
-
Data Pages: W not X (e.g., stack, heap)
Implementations
AMD and Intel introduced “NX” bit (no execute)
Linux PaX implements W \( \oplus \) X for x86 processors without NX bit hardware
-
Based on segment limit registers
-
Halve address space available to each process
-
Minor performance reduction
Limitations
Break JIT
W \( \oplus \) X Page Protection break just-in-time (JIT) code generation in legacy applications
JIT:
Applications generate codes while running,
and then jump into the generated code (e.g., run-time type checking in JavaScript)
The return-to-libc Attack
Instead of putting shellcode on stack but put arguments of function in libc there
then overwrite return address with pointer to that function
e.g., system("/bin/sh");
such that the buffer overflow attacks overwrite the return address by the address of system
in text segment
and the only argument is stored on stack as part of user input
Address Space Layout Randomization
Background
To operate the return-to-libc attacks, attacker must know, predict, or guess the exact address
including shellcode buffer address, libc function address, string argument address, etc.
Idea
Randomize addresses in process to let attackers guess wrong address with high probability such that
-
Jump to unmapped memory: crash
-
Jump to invalid instruction stream: crash
Useful as efficient exploit detector (memory faults or illegal instructions suggest exploit)
Implementations
x86-32
Linux process contains three memory regions
-
Executable: text, init data, uninit data
-
Mapped: heap, dynamic (shared) libraries, thread stacks, shared memory
-
Stack: user stack
ASLR adds random offset to each region when process created (with 16, 16, 24 bits randomness)
This is easy to be supported by virtual memory hardware
Limitations
x86-32: Random Offset to Mapped Region is Limited to 16 Bits
-
highest 4 bits (28-31) cannot be changed; otherwise, it would interfere with big
mmap()
s -
lowest 12 bits (00-11) cannot be changed; otherwise, it make
mmap()
ed pages not be page-aligned (should be in 4K boundary)
x86-32: Derandomization Attacks
Limitation of 16 bits random offset is easy to guess by brute force attacks in mapped region
After the random offset is known, the exact address can be predicted such that return-to-libc attacks will work again
-
Should know the base of mapped region without random offset (by
objdump
tools) -
Try to return to
usleep()
, guessing random offset for mapped region each timethe
usleep()
argument is selected to 0x01010101 (16 seconds) that is long enought to observe -
If guess wrong, target process crashes, the kernel closes connection immediately;
The parent process creates new child with same random offset as previous died child using
fork()
In Unix, the contents of memory are copied from parent process to child process
and some of them include pointers will not be valid (not point to the right place) if addresses are changed
-
If guess right, target process delays in
usleep()
, then crashes and closes connection immediately -
After the right guess, compute the exact address of target address to perform return-to-libc attacks
Details of attack is illustrated in Section 2.2.2 in paper On the Effectiveness of Address-Space Randomization
x86-64: Derandomization Attacks
The random offsets will be increased to about 40 bits
to make brute force attack much harder without attracting attention
Performance of Derandomization Attacks
The performance in x86-32 is shown in Section 2.2.3 in paper On the Effectiveness of Address-Space Randomization
-
For single randomization at startup, the expected number of probes is \( 2^{n-1} \)
-
For n-bit full randomization (parent and child) after every crash, the expected number of probes is \( 2^n \) (only twice)
The analysis is in Section 3.2 in paper On the Effectiveness of Address-Space Randomization
Taint Check
Approach
Instrument program to monitor its own execution and detect when exploit occurs
Goal
-
Work on binaries (no source code required)
-
Low false positives and false negatives
-
Detect wide range of exploits (new varieties all the time & point solutions unconvincing)
-
Help humans understand how exploit worked after the exploit
Idea
Many exploits use data supplied by user (or derived from data supplied by user)
to subvert control flow of program (e.g., modify jump, call instruction target addresses, or function return addresses)
During execution, before any control transfer instruction, validate target address not derived from user-supplied data
-
If it is, exploit detected and raise alarm
-
If it not, continue execution normally
NOT TRUST user-supplied data by marking them as tainted
and propagate taint during execution
-
Results of operations on tainted data should be tainted
-
Copies of tainted data should be tainted
-
Taint will be cleared when tainted data is overwritten with untainted data
Implementations
Propagatation
-
After I/O system calls
If reading from socket, mark targetbuffer contents as tainted
-
After all memory load instructions
If source memory tainted, mark register tainted
If source memory untainted, mark register untainted
-
After all memory store instructions
If source register tainted, mark memory tainted
If source register untainted, mark memory untainted
-
After all arithmetic instructions
If any operand tainted, mark result tainted
If no operands tainted, mark result untainted
Detection
Add the following checking before all control transfer instructions (e.g., jump, call, ret, etc.)
-
If register or memory location holding target function pointer is tainted, raise alarm
-
If not, continue execution normally
Shadow Memory
The taint status (flag) of each byte if memory is tracked by shadow memory
e.g., Is-Taint(addr) -> { T | F }
and Taint(addr, len) or Untaint(addr, len)
Operation Mode
-
Fast
single bit for taint flag of each byte of memory
(if single bit taint flag of each bit of memory will be much faster without packing bits to byte)
-
Detailed
four-byte pointer to taint data structure (including details of system call, stack, value)
written at time of tainting and useful for exploit analysis
Expansions
can successfully detects many overwrite exploits (targeting at return address, function pointer, format string, GOT)
since it can also instrument function and system calls
Limitations
Greatly affect performance
When executing with a single request which returns small data (i.e., 1 KB),
the overhead will be multiple times (i.e., 25 times) than CGI without Valgrind (as baseline)
but the overhead of CGI (asking dynamic script on the server) is already very high due to invokings and interpreters
Moreover, the evaluation time might not be a right metric where throughput is better (can use concurrent requests)
Implicit Flows
suppose x
is tainted:
if (x = 0)
y = 0;
else
y = 1;
such that y
is set by a constant that is directly influenced by x
(and y
will influence other values)
However, taint propagatation not include condition flags (possible false negatives)
If including condition flags, it will results in inappropriate propagation of taint (possible false positives)
Applications
Identify worm payloads
Can be configured to store trace of tainted data flow from all inputs
When exploit detected, can walk back to identify input that led to exploit
Could pass worm payloads to signature generation system, like Autograph
which is Much more accurate than port-scanner heuristic
Prevent exploit of server
Halt execution upon exploit detection
but due to high overhead, it is too slow for production servers
Moreover,
adversary may possibly be able to
detect monitored servers by their slow response time and avoid sending them exploit payload
Therefore, it might be useful to deploy on a few servers which use sample traffic as input