The success of the 5.X series hinges on the ability to deliver fine-grained threading and re-entrancy in the kernel (also known as SMPng) and kernel-supported POSIX threads in userland, while not sacrificing overall system stability or performance.
The state of SMPng and kernel lockdown is the biggest concern for 5.X. To date, few major systems have come out from under the kernel-wide mutex known as “Giant”. The SMP status page at http://www.FreeBSD.org/smp provides a comprehensive breakdown of the overall SMPng status. Status specific to SMPng progress in device drivers can be found at http://www.FreeBSD.org/projects/busdma. In summary:
VM: Kernel malloc is locked and free of Giant. The UMA zone allocator is also free of Giant. vm_object locking is in progress and is an important step to making the buffer/cache free of Giant. Pmap locking remains to be started.
GEOM: The GEOM block layer was designed to run free of Giant and allow GEOM modules and underlying block drivers to run free of Giant. Currently, only the ata(4) and aac(4) drivers are locked and run without Giant. Work on other block drivers is in progress. Locking the CAM subsystem is required for nearly all SCSI drivers to run without Giant; this work has not started yet.
Additionally, GEOM has the potential to suffer performance loss due to its upcall and downcall data paths happening in kernel threads. Improved lightweight context switches might help this.
Network: Work has restarted on locking the network stack. Routing tables, ARP, bridge, IPFW, Fast-Forward, TCP, UDP, IP, Fast IPSEC, and interface layers are being targeted initially, along with several Ethernet device drivers. The socket layer, IPv6, and other protocol layers will be targeted later. The primary goal of this work is to regain the performance found in FreeBSD 4.X. The cost of context switching to the device driver ithreads and the netisr is still hampering performance.
VFS: Initial pre-cleanup started.
buffer/cache: Initial work complete on locking the buffer.
Proc: Initial proc locking is in place, further progress is expected for FreeBSD 5.2.
CAM: No significant work has occurred on the CAM SCSI layer.
Newbus: some work has started on locking down the device_t structure.
Pipes: complete
File descriptors: complete.
Process accounting: jails, credentials, MAC labels, and scheduler are out from under Giant.
MAC Framework: complete
Timekeeping: complete
kernel encryption: crypto drivers and core crypto(4) framework are Giant-free. KAME IPsec has not been locked.
Sound subsystem: complete, but lock order reversal problems seem to persist.
kernel preemption: preemption for interrupt threads is enabled. However, contention due to Giant covering much of the kernel and most of the device driver interrupt routines causes excessive context switches and might actually be hurting performance. Work is underway to explore ways to make preemption be conditional.
SMPng introduced the concept of dedicating kernel threads, known as ithreads, to servicing interrupts. With this, driver interrupt service routines are allowed to block for mutexes, memory allocations, etc. While this makes writing drivers easier, it introduces considerable latency into the system due to the complete process context switch which must be performed in order to service the ithread. This is aggravated by the extensive coverage over the kernel by the Giant mutex, and often results in multiple sleeps and context switches in order to service an interrupt. Drivers that register their interrupt as INTR_MPSAFE are less likely to feel these aggravating effects, but the overhead of doing a context switch remains. Interrupt service routines that are registered as INTR_FAST are run directly from the interrupt context and do not suffer these problems at all. However, the INTR_FAST property forces the interrupt line to be exclusive; no sharing can occur on it. The proliferation of shared interrupts on PC systems makes this undesirable.
Several ideas have been proposed to help combat this problem:
Special casing ithreads to be lightweight is a possibility. This might involve reducing the amount of saved context for the ithread, stack-borrowing from another kthread, and/or creating a new fast-path to avoid the mi_switch() routine.
A new interrupt model can be introduced to allow drivers to register an 'interrupt filter' along with a normal service routine. This would be similar to the Mac OS X model in use today. Interrupt filter routines would allow the driver to determine if it is interested in servicing the interrupt, allow it to squelch the interrupt source, and possibly determine and schedule service actions. It would run in the same context as the low-level interrupt service routine, so sleeping would be strictly forbidden. If actions that result in sleeping or blocking for long periods are required, the filter would signal to the caller that its normal ithread routine should be scheduled.
The FreeBSD 5.1 development cycle saw the KSE package jump into a highly usable state. THR, an alternate threading package based on some of the KSE kernel primitives but implementing purely 1:1 scheduling semantics also appeared and is in a similarly experimental but usable state. Users may interchange these two libraries along with the legacy libc_r library via relinking their apps or by using the new libmap feature of the runtime linker. This excellent progress must be driven to completion before the RELENG_5 branch point so that the libc_r package can be deprecated.
The kernel and userland components for KSE and THR must be completed for all Tier-1 platforms. The decision on which thread package to sanction as the default will likely be made on a per-platform basis depending on the stability and completeness of each package.
Table 1. KSE Status
Platform | Kernel | Userland | Works? |
---|---|---|---|
i386 | YES | YES | YES |
alpha | NO | YES | NO |
sparc64 | YES | NO | NO |
ia64 | YES | YES | YES |
amd64 | YES | YES | YES |
Table 2. THR Status
Platform | Kernel | Userland | Works? |
---|---|---|---|
i386 | YES | YES | YES |
alpha | YES | YES | YES |
sparc64 | YES | YES | NO |
ia64 | YES | YES | YES |
amd64 | NO | NO | NO |
KSE must pass the ACE test suite on all Tier-1 platforms. Additional real-world testing must also be performed to ensure that the libraries are indeed useful. At a minimum, the following packages should be tested:
OpenOffice
KDE Desktop
Apache 2.x
BIND 9.2.x
MySQL
Java™ 1.4.x