9 minute read

UCL Course COMP0133: Distributed Systems and Security LEC-18


Motivations & Risks

Users often wish to extend application with new functionality made available as a binary module

The code from these untrusted extensions will run in the address space of the application

If the extension code is malicous (or has bugs), there will be following risks

  • Overwrite trusted data or code

  • Read private data from trusted code memory

  • Execute privileged instruction

  • Call trusted functions with bad arguments

  • Jump to middle of trusted function (e.g., jump over checks)

  • Contain vulnerabilities allowing others to do above


Allowed Operatoins for Untrusted Code

If the untrusted code has its own virtual memory space, some operations will still be allowed

  • Read/Write/Execute its own memory

  • Call explicitly allowed functions in trusted code at correct entry points


Strawman Solution: Isolation with Process

Run original application (trusted) code in one process while untrusted code in another

Communicate between them by RPC between processes

Limit untrusted operations of untrusted code in its own memory not the memory of original application

Disadvantages

  • Not transparent for programmers

  • Poor performance for users


Better Solution: Software-Based Fault Isolation

Idea

Run untrusted extension code in the same process (address space) as trusted application code

Place the untrusted code and data in sandbox to prevent them from

  • writing to trusted application memory outside sandbox, and

  • transferring control to trusted application code outside sandbox

Add instructions before memory writes and jumps during compiling

to inspect their targets and constrain their behavior

Use Scenario

Developer runs sandboxer on unsafe extension code, to produce safe, sandboxed version:

  • add instructions that sandbox unsafe instructions

  • transformation done by compiler (or by binary rewriter)

Before running untrusted binary code, user runs verifier on it:

  • check that safe instructions not access memory outside extension code data

  • check that sandboxing instructions in place before all unsafe instructions

such that the sanboxer can be complicated but the verifier can be lightweight for the checking function

and the user only needs to trust the verifier regardless of sanboxer


Implementation in Sandboxer

Fault Domain

Limit untrusted code within a fault domain (still in the same address as trusted code)

  • Unique ID

    used for access control on syscalls

  • Code Segment

    virtual address range with same unique high-order bits, used to hold code

  • Data segment

    virtual address range with same unique high-order bits, used to hold data (e.g., heap, stack, static data, etc.)

Each segment has an unique high-order bits (e.g., high three bytes of address) for a segment

Untrusted code should only be able to (ensured by sandboxer)

  • jump within code segment in its fault domain

  • write within data segment in its fault domain

Two Types of Memory Addresses in Instructions

  • direct (specified statically in instruction, known before runtime)

    can be made safe statically by sandboxer during compiling

  • indirect (computed from value in register, known at runtime)

    must be made safe by sandboxer at runtime


Rewrite Instruction for Indirect Memory Addresses

For instruction STORE R0, R1 ; write R1 to memory at value in R0

Normal Mode

MOV Ra, R0 ; copy value in R0 (current target address) into Ra

SHR Rb, Ra, Rc ; store result of Ra >> Rc (current Segment ID) into Rb

CMP Rb, Rd ; compare value in Rb and Rd where Rd stores the correct Segment ID

BNE fault ; if not equal, cause fault

STORE Ra, R1 ; if equal, write R1 to memory at value in Ra

Note: the last instruction should NOT be STORE R0, R1 again

since adversaries can bypass previous check instructions by jumping directly to the last instruction

  • add 4 instructions before each indirect store

  • use 6 registers, 5 of which must be dedicated (not available to untrusted code)

    1 register to store the Segment Mask (i.e., Rc)

    2 registers to store the correct Segment ID for Code and Date seperately (i.e., Rd)

    2 registers to store the current sandboxed (might unsafe) target address for Code and Date seperately (i.e., Ra)


Fast Mode

Not checking whether the current Segment ID is correct or not but replacing the it with the correct one

AND Ra, R0, Re ; store result of R0 & Re into Ra to clear the current segment ID where Re is the Segment Mask

OR Ra, Ra, Rf ; store result of Ra | Rf into Ra to set the correct segment ID which is stored in Rf

STORE Ra, R1 ; write R1 to memory at value in Ra

Note: the Segment Mask will currently have 0 in high-order bits (Segment ID field) and 1 for low-order bits

  • add 2 instructions before each indirect store

  • use 5 dedicated registers

Similarly,

For instruction JR R0 ; jump to or call the memory at value in R0

The modification is that

AND Rg, R0, Re ; store result of R0 & Re into Rg to clear the current segment ID where Re is the Segment Mask

OR Rg, Rg, Rh ; store result of Rg | Rh into Ra to set the correct segment ID which is stored in Rh

JR Rg ; jump to or call the memory at value in Rg

one register to store the Segment Mask is the same (i.e., Re)

two registers to store the correct Segment ID
are different for Code (i.e., Rh) and Data (i.e., Rf)

two registers to store the current sandboxed (might unsafe) target address
are different for Code (i.e., Rg) and Data (i.e., Ra)


Optimizations in Sandboxer

Guard Zones

Observation-1: some instructions use “register + offset” addressing:

use register as base and supply offset for CPU to add to it

which SFI should require an additional ADD instruction to compute the base + offset

Observation-2: offsets are of limited size (\( \pm \)65,536 on MIPS) because of instruction encoding

such that SFI can surround each segment with 0xfffff (0 ~ 65535) guard zone which are unmapped pages

and access the guard zone (unmapped pages) will cause traps

Therefore, the offsets can be ignored when sandboxing
to save the additional ADD instruction to compute the base + offset

Stack Pointer

Observation: stack pointer is read far more often than written

since it is often used as base address for many “register + offset” instructions

such that SFI will not sandbox uses (read) of stack pointer as base address

but will sandbox setting (write) of stack pointer such that stack pointer always contains safe value

Therefore, it will reduce number of instructions that pay sandboxing overhead


Implementations of Verifier

After receiving sandboxed code, verifier must ensure all instructions safe

  • For direct memory addressing, check statically that Segment IDs in addresses are correct

  • For indirect memory addressing, check instructions should be preceded by full set of sandboxing instructions

  • Check there is no privileged instructions in code (e.g., system calls in the untrusted code)

  • Check the target PC-relative branches should fall in code segment

If sandboxed code fails any of these checks, verifier rejects it; otherwise, code is correctly sandboxed


Limitations of SFI in MIPS

Variable-Length Instructions in x86

MIPS instructions are fixed-length while x86 instructions are variable-length

which will cause that attackers in x86 can jump into middle of x86 instructions

e.g.,

when attackers jump into the second bytes of 25 CD 80 00 00 (AND eax, 0x80CD),

it will execute CD 80 which is int 0x80 —– a system call on Linux

such that jump into middle of x86 instructions could also jump out of fault domain into application trusted code

Fewer Registers in x86-32

There are only 4 general-purpose registers in x86 which cannot dedicate registers easily (not a problem in x86-64)


Effectiveness

Stack Smash

The sandboxer will only allow jump within Code segment in its fault domain

and the injected code is stored on the stack within Data segment

such that stack smash attacks cannot be executed

return-to-libc

Attackers can overwrite return address (through buffer overflow) with one address within Code segment in fault domain

such that return-to-libc attacks can be executed within the untrusted extension

string format vulnerability

Attackers can overwrite return address (through %n overwritten) with one address within Code segment in fault domain

such that string format vulnerability attacks can be executed within the untrusted extension

Conclusion

Because

  • SFI allows write any data into data segment of extension

  • SFI allows jump into any address in code segment of extension

such that attacker can exploit in extension (not designed for exploit but for untrusted code isolation)

and can probably cause jump out of fault domain on x86 (not designed for x86 but for MIPS)


Further Readings

  • Google’s NativeClient project implements SFI for x86-32, x86-64, and ARM

  • Control Flow Integrity for x86-32 and x86-64