Eliminating Receive Livelock

4 minute read

UCL Course COMP0133: Distributed Systems and Security LEC-10

Motivation

Performance decreases with further increasing load after reaching the bottleneck

The ideal performance is that the performance keeps the highest value even the load increases

Background

Event

I/O devices need to notify CPU of events

Packet arrival at network interface
Disk read complete
Key pressed on keyboard

Polling

CPU “asks” hardware device if any events have occurred (synchronous)

Requires programmed or memory-mapped I/O (relatively slow; over I/O bus)

CPU “blindly” polls device explicitly in code

to guarantee low latency, must poll very often
high CPU overhead to poll very often

For rare I/O events, CPU overhead of polling unattractive

Interrupt

Hardware device sends a signal to CPU saying “events have completed” (asynchronous)

I/O devices have dedicated wire(s) that they can use to signal interrupt(s) to CPU

On interrupt, if interrupt priority level (IPL) > CPU priority level

CPU saves state of currently running program
jumps to interrupt service routine (ISR) in kernel
invokes device driver, which asks device for events
returns to previously running program

CPU priority level: kernel-set machine state specifying which interrupts allowed (others postponed by CPU)

Interrupts well-suited to rare I/O events:
lower latency than rarely polling, lower CPU cost than constantly polling

Interrupt-Driven Networking

Packet arrives
Network card interrupts at “high” IPL (because small buffers on network interfaces)
ISR looks at Ethernet header, enqueuespacket for further processing, returns
“Low” IPL software interrupt dequeues packets from queue, does IP/UDP/TCP processing, enqueues data for dst process
Process reads data with read() system call

Because queues denote scheduling and priority level boundaries,

minimizing work in ISR reduces service latency for other device I/O interrupts

Receive Livelock

However, interrupts take priority over all other system processing such that

when event rate becomes extremely high,
system spends all its time servicing interrupts, then no other work will be done

In Interrupt-Driven Networking,

As input rate increases beyond maximum loss-free receive rate, output rate decreases

随着输入速率增加超过最大无损接收速率，输出速率降低
System wastes CPU preparing arriving packets for queue, all of which dropped

系统浪费 CPU 为队列准备到达的数据包，所有这些数据包都被丢弃
For input burst of packets, first packet not delivered to user level until whole burst put on queue

对于数据包的输入突发，第一个数据包在整个突发放入队列之前不会传递到用户级别

(e.g., leaves NFS server disk idle!)
In systems where transmit lower-priority than receive, transmit starves

在传输优先级低于接收的系统中，传输不足

Livelock Avoidance Technique 1

Minimize Receive Interrupts

When receive ISR

sets flag indicating this network interface has received one or more packets
schedules kernel thread that polls network interfaces for received packets
leaves receive interrupts disable

Livelock Avoidance Technique 2

Kernel Polling Thread

When schedule flaged interfaces

process packet all the way through kernel protocol stack (IP/forwarding/UDP/TCP),

ending with interface output queue or socket buffer to application
maximum quota on packets processed for same interface on one invocation for fairness

When under overload without quota, it will be keeping interface receiving without moving to transmit

If packets arrive too fast, the input-handling callback never finishes its job. This means that the polling thread never gets to call the output-handling callback for the transmitting interface, which prevents the release of transmitter buffer descriptors for use in further packet transmissions (similar to the transmit starvation condition)

如果数据包到达太快，输入处理回调永远不会完成它的工作。这意味着轮询线程永远不会调用传输接口的输出处理回调，这会阻止释放传输器缓冲区描述符以用于进一步的数据包传输（类似于传输饥饿条件）

The result is actually worse in the no-quota modified kernel, because in that system, packets are discarded for lack of space on the output queue rather than on the IP input queue. The unmodified kernel does less work per discarded packet, and therefore occasionally discards them fast enough to catch up with a burst of input packets

结果实际上在无配额修改的内核中更糟，因为在该系统中，由于输出队列上的空间不足而不是 IP 输入队列上的空间不足而丢弃了数据包。未修改的内核对每个丢弃的数据包所做的工作较少，因此偶尔会以足够快的速度丢弃它们以赶上输入数据包的爆发
round-robins among interfaces and between transmit and receive
Re-enable interface receive interrupts only when no pending packets at that interface

When under overload, the network interface will drop packets when buffering exhausted (no CPU cost)

Livelock Avoidance Technique 3

Because user-level application still cannot run under heavy receive load

Disable receive interrupts when queue to user application fills

Summary

Scheduling vital to performance of a busy server
Polling while heavy load while Interrupt light load

Lesson: Understanding cross-layer behavior vital to finding performance limitations and designing for high performance

Endong Liu