Eliminating Receive Livelock
UCL Course COMP0133: Distributed Systems and Security LEC-10
Motivation
Performance decreases with further increasing load after reaching the bottleneck
The ideal performance is that the performance keeps the highest value even the load increases
Background
Event
I/O devices need to notify CPU of events
-
Packet arrival at network interface
-
Disk read complete
-
Key pressed on keyboard
Polling
CPU “asks” hardware device if any events have occurred (synchronous)
Requires programmed or memory-mapped I/O (relatively slow; over I/O bus)
CPU “blindly” polls device explicitly in code
-
to guarantee low latency, must poll very often
-
high CPU overhead to poll very often
For rare I/O events, CPU overhead of polling unattractive
Interrupt
Hardware device sends a signal to CPU saying “events have completed” (asynchronous)
I/O devices have dedicated wire(s) that they can use to signal interrupt(s) to CPU
On interrupt, if interrupt priority level (IPL) > CPU priority level
-
CPU saves state of currently running program
-
jumps to interrupt service routine (ISR) in kernel
-
invokes device driver, which asks device for events
-
returns to previously running program
CPU priority level: kernel-set machine state specifying which interrupts allowed (others postponed by CPU)
Interrupts well-suited to rare I/O events:
lower latency than rarely polling, lower CPU cost than constantly polling
Interrupt-Driven Networking
-
Packet arrives
-
Network card interrupts at “high” IPL (because small buffers on network interfaces)
-
ISR looks at Ethernet header, enqueuespacket for further processing, returns
-
“Low” IPL software interrupt dequeues packets from queue, does IP/UDP/TCP processing, enqueues data for dst process
-
Process reads data with read() system call
Because queues denote scheduling and priority level boundaries,
minimizing work in ISR reduces service latency for other device I/O interrupts
Receive Livelock
However, interrupts take priority over all other system processing such that
when event rate becomes extremely high,
system spends all its time servicing interrupts, then no other work will be done
In Interrupt-Driven Networking,
-
As input rate increases beyond maximum loss-free receive rate, output rate decreases
随着输入速率增加超过最大无损接收速率,输出速率降低
-
System wastes CPU preparing arriving packets for queue, all of which dropped
系统浪费 CPU 为队列准备到达的数据包,所有这些数据包都被丢弃
-
For input burst of packets, first packet not delivered to user level until whole burst put on queue
对于数据包的输入突发,第一个数据包在整个突发放入队列之前不会传递到用户级别
(e.g., leaves NFS server disk idle!)
-
In systems where transmit lower-priority than receive, transmit starves
在传输优先级低于接收的系统中,传输不足
Livelock Avoidance Technique 1
Minimize Receive Interrupts
When receive ISR
-
sets flag indicating this network interface has received one or more packets
-
schedules kernel thread that polls network interfaces for received packets
-
leaves receive interrupts disable
Livelock Avoidance Technique 2
Kernel Polling Thread
When schedule flaged interfaces
-
process packet all the way through kernel protocol stack (IP/forwarding/UDP/TCP),
ending with interface output queue or socket buffer to application
-
maximum quota on packets processed for same interface on one invocation for fairness
When under overload without quota, it will be keeping interface receiving without moving to transmit
If packets arrive too fast, the input-handling callback never finishes its job. This means that the polling thread never gets to call the output-handling callback for the transmitting interface, which prevents the release of transmitter buffer descriptors for use in further packet transmissions (similar to the transmit starvation condition)
如果数据包到达太快,输入处理回调永远不会完成它的工作。 这意味着轮询线程永远不会调用传输接口的输出处理回调,这会阻止释放传输器缓冲区描述符以用于进一步的数据包传输(类似于传输饥饿条件)
The result is actually worse in the no-quota modified kernel, because in that system, packets are discarded for lack of space on the output queue rather than on the IP input queue. The unmodified kernel does less work per discarded packet, and therefore occasionally discards them fast enough to catch up with a burst of input packets
结果实际上在无配额修改的内核中更糟,因为在该系统中,由于输出队列上的空间不足而不是 IP 输入队列上的空间不足而丢弃了数据包。 未修改的内核对每个丢弃的数据包所做的工作较少,因此偶尔会以足够快的速度丢弃它们以赶上输入数据包的爆发
-
round-robins among interfaces and between transmit and receive
-
Re-enable interface receive interrupts only when no pending packets at that interface
When under overload, the network interface will drop packets when buffering exhausted (no CPU cost)
Livelock Avoidance Technique 3
Because user-level application still cannot run under heavy receive load
Disable receive interrupts when queue to user application fills
Summary
-
Scheduling vital to performance of a busy server
-
Polling while heavy load while Interrupt light load
Lesson: Understanding cross-layer behavior vital to finding performance limitations and designing for high performance