10 minute read

UCL Course COMP0133: Distributed Systems and Security LEC-16


Software is Imperfect

Programs are complex, and often include subtle bugs unforeseen by the programmer

It is fundamentally hard to prevent all programmer errors

  • Design itself may use flawed logic

  • Even if logic correct, implementation may vary from programmer intent

Advanced Programming Languages

C and C++ particularly dangerous

  • Allow arbitrary manipulation of pointers

  • Require programmer-directed allocation and freeing of memory

  • Don’t provide memory safety; very difficult to reason about which portions of memory a line of C changes

  • Offer high performance, so extremely prevalent, especially in network servers and OSes

Java offers memory safety, but not a panacea

since JRE written in (many thousands of lines of) C

Vulnerabilities and Exploits

Vulnerability

input-dependent bug that can cause program to complete operations that deviate from programmer’s intent

Exploit

input that, when presented to program, triggers a particular vulnerability

Attacker can use exploit to execute operations without authorization on vulnerable host

Vulnerable program executes with some privilege level

  • Many network servers execute as superuser

  • Users run applications with their own user ID

  • Result: great opportunity for exploits to do harm


Prerequisites

Virtual Memory Map of a UNIX Process

  • Text

    executable instructions, read-only data (size fixed at compile time)

  • Data

    initialized and uninitialized data (grows towards higher addresses)

  • Stack

    function arguments and local variables (grows toward lower addresses & Last-In-First-Out)

Stack Frame

Stack Frame is the region of stack used within C function

  • Stack Pointer

    %esp (always points to the top of stack by default)

  • Frame Pointer

    %ebp (points to the base of the frame of the current function)

Function Process

  • To call function f(), allocate new stack frame

    1. Push arguments, e.g., f(a, b, c) (in reversed order)

    2. Push return address: next instruction (%eip) in caller

    3. Set %eip = address of f(); jump to callee

    4. Push saved frame pointer where %ebp is caller’s stack frame

    5. Set %ebp = %esp to set frame pointer to start of new frame

    6. Set %esp -= sizeof(locals); to allocate local variables

  • To return from f(), deallocate stack frame

    1. Set %esp += sizeof(locals); to deallocates local variables

    2. Set %ebp = saved frame pointer from stack such that it changes to caller’s stack frame

    3. Set %eip = saved return address from stack such that it returns to next instruction in caller


Buffer Overflow Vulnerabilities and Stack Smashing Exploits

Buffer Overflow

  • Buffers in C manipulated using pointers

  • C allows arbitrary arithmetic on pointers

    • Compiler has no notion of size of object pointed to

    • So programmers must explicitly check in code that pointer remains within intended object

    • But programmers often do not do so; vulnerability!

  • Buffer overflows used in many exploits

    e.g.,

    1. Input long data that runs past end of programmer’s buffer, over memory that guides program control flow

    2. Enclose code you want executed within data

    3. Overwrite control flow info with address of your code

Stack Smashing

Reason

User-influenced Data is stored together with Control Data in stack

The return address stored on stack directly influences program control flow

and local variables allocated before return address in stack frame

if programmer allocates buffer as local on stack, reads input,
and writes it into buffer without checking input fits in buffer:

  • send input containing shellcode you wish to run

  • write past end of buffer, and overwrite return address with address of malicious code within stack buffer

  • when function returns, the malicious code executes

Design Principle

There are some difficulties for attackers

  • existence of stack-allocated buffer without bounds check in program (easy for open-source code)

  • exact address for start of stack-allocated buffer (address changes)

  • exact offset of return address beyond buffer start (buffer size without source code)

However, attackers do not know these exact values

For exact address

  • Attackers can precede shellcode with NOP slide (long sequence of NOPs or equivalent instructions)

  • If programe jumps into NOP slide (NOP will be ignored), shellcode executes

For exact offset

  • Attackers can repeat shellcode’s address many times in input

  • if first instance occurs before return address’s location on stack,

    and return address will be overwritten by one instance with enough repeats

Practical Shellcode

Shellcode usually executes command /bin/sh which gives attacker a shell on exploited machine

// shellcode.c
void main()
{
    char *name[2];
    name[0] = "/bin/sh";
    name[1] = NULL;
    execve(name[0], name, NULL);
    exit(0);                     // if execve fail, not cause core dump
}

It is easy to compile and disassemble shellcode.c to get HEX representaton of instructions (e.g., by gdb)

However, the exact address of string "/bin/sh" (first argument of execve) is needed to execute execve

Attackers can use jmp and call instruction which allow %eip-relative addressing (offset from current %eip)

Use relative offset addressing

  • add call instruction at end of shellcode, with target of first shellcode instruction, using relative addressing

  • place "/bin/sh" immediately after call instruction

  • precede first shellcode instruction with jmp to call, using relative addressing

  • call will push next instruction’s address onto stack

  • after call, stack will contain address of "/bin/sh"

Use label (easier way)

C marks end of string with NULL byte (e.g., strcpy() will stop copying
if they encounter NULL byte in shellcode instructions)

Attackers can replace instructions containing NULL bytes
with equivalent instructions that not contain NULL bytes in HEX encodings

Mitigation

  • Always explicitly check input length against target buffer size

  • Avoid C library calls that not do length checking

    e.g., sprintf(buf, …), scanf(“%s”, buf), strcpy(buf, input)

  • But use safer versions

    e.g., snprintf(buf, buflen, …), scanf(“%256s”, buf), strncpy(buf, input, 256)


Format String Vulnerabilities and Exploits

Format String Vulnerabilities

There are printf-like functions in C

  • printf(char *fmtstr, arg1, arg2, …); e.g., printf(“%d %d”, 17, 42);

  • where the first argument, format string fmtstr, specifies number and type of further arguments

If programmer allows input to be used as format string,

attacker can force printf-like function to overwrite arbitrary memory
(e.g., return address, privilege flag, Global Offset Table, etc)

Format String Exploits

Use %n to Overwrite Memory

%n format string specifier directs printf to write

the number of bytes written before this specifier

into the integer pointed to by the matching int* argument

int i;
printf("foobar%n\n", (int *) &i);  // output: foobar
printf("i = %d\n", i);             // output: i = 6 (foobar has six bytes: f, o, o, b, a, r)

Because printf increments pointer to point next argument predictively for each string specifier in fmtstr

such that attackers

  • supply the target address that attackers want to overwrite at start of fmtstr

  • add string specifiers in fmtstr to increment pointer so it points to target address that attackers want to overwrite

  • supply %n string specifier at end of fmtstr

Result: Attackers can overwrite the target address with the number of bytes before %n

Use %u to Control Written Values

%u format string specifier allows indication of exactly how many characters to output

e.g., %20u means “use 20 digits when printing this unsigned integer” which will cost 20 bytes

Practical Format String

Template

[dummy int][4 bytes target address + 0]
[dummy int][4 bytes target address + 1]
[dummy int][4 bytes target address + 2]
[dummy int][4 bytes target address + 2]
[stack pop]
%[control 1st byte value to write]u%n
%[control 2nd byte value to write]u%n
%[control 3rd byte value to write]u%n
%[control 4th byte value to write]u%n

dummy int

4 non-zero bytes that are used to consume %u

e.g., \x01\x01\x01\x01

target address

e.g.,

the target address is 0xbfff8218 (based on the little-endian)

  • 4 bytes target address + 0 is \x18\x82\xff\xbf

  • 4 bytes target address + 1 is \x19\x82\xff\xbf

  • 4 bytes target address + 2 is \x1a\x82\xff\xbf

  • 4 bytes target address + 3 is \x1b\x82\xff\xbf

stack pop

A sequence of %08x format string specifier to advance printf() argument pointer to first byte after stack pop

Each %08x shows memory in stack with 8-digit padded hexadecimal numbers,

and 2-digit padded hexadecimal numbers represents 1 byte, which means each %08x represents 4 bytes in memory

e.g.,

when printf() begins processing the format string fmtstr,

its “next argument to print” pointer points to a memory location

that is exactly 48 bytes lower in memory than the location of the buffer fmtstr

such that the number of %08x should be 48 / 4 = 12

control value

Each dummy int costs 4 bytes

Each target address costs 4 bytes

Each %08x costs 8 bytes —– different from it represents in memory

Note that the counter for %n is cumulative, and there is an algorithm to calculate next counter

int compute_padding(int write_byte, int already_written)
{
    write_byte += 0x100;
    already_written %= 0x100;
    
    int padding = (write_byte - already_written) % 0x100;
    if (padding < 10)
        padding += 0x100;
    
    return padding;
}

Attention

Because each overwritten value in each target address is determined by one %n which is an integer,

and each integer will consume 32 bits (4 bytes) in memory in 32-bit system,

such that each overwritten value will fill 4 bytes in memory

but only the value in the least significant byte (in lowest memory address) should be kept

Result

After using 4 %n format string specifiers,
there will be 4 meaningful bytes in memeory are overwritten (Write 4 Primitives)


Disclosure and Patching of Vulnerabilities

Software vendors and open-source developers audit code, release vulnerability reports

  • Usually describe vulnerability, but don’t give exploit

  • Often include announcement of patch

Race after disclosure: users patch vs attackers exploit

  • Users are often lazy or unwilling to patch

  • Attackers usually have high incentives to find vulnerabilities and design exploits

    • Arbitrary code injection allow:

      1. Defacing of widely viewed web site

      2. Stealing valuable confidential data from server

      3. Destruction of data on server

      4. Recruitment of zombies to botnets (spam, DoS)

    • Market in vulnerabilities and exploits

  • “patches” can break software, or include new vulnerabilities

Disclosure best for users: can patch or disable vs. risk of widest harm by 0-day exploits

where the undisclosed vulnerabilities called 0-day Exploits

However, preventing all exploits extremely challenging (nearly impossible)

  • Stopping one category leads attackers to use others

  • New categories continually arising